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iNTi^oDi'rno:; 


1 . n tsagrc'onn'n  t 

(^roiip  decisions  are  luisic  components  of  society.  Tlicy  occur  at  i-verv 
level  ol  loci.il  i n t e rac  t ion— f rom  the  selection  of  an  eveninp.'s  on  te^'l  i i nnent 
hy  a lasiiallv  ilatinp  couple  to  tlie  election  of  a president  by  a lart>e  nation. 

Hut  despite  their  ul)lquitous  presence,  group  decisions  are  not  well  understood. 
That  cor’Pient  holds  iioth  for  descriptive  theor'-— the  theery  o''  hov;  roups  male 
dec  I i on!i~and  for  prescriptive  t heorv-ru  les.  lor  nalinr  rational  ;’,roui> 
dt'c  i s i ons  . 

f'lis  .situation  ctuUrasts  with  the  state  of  decir.ion  tlteorv  lor  individuals. 
Thert-  is  a 1 airly  ricli  literature  dealing  with  empirical  studies  of  individual 
ihoice;  and  tlie  theorv  of  rational  individual  clioi  ce— of  ten  called  dec  i s ion 
analys  i s-lias  made  rapid  progress  in  tlie  past  quarter  of  a centurv.  Stemming 
primarily  from  the  theory  of  games,  a coherent  set  of  rules  has  evolved  which 
has  proved  highly  fruitful  in  identifying  the  major  elements  of  individual 
choice,  and  in  establ isliing  a framework  within  which  rational  Individual 
dt>cisions  can  be  defined.  This  framework  Involves  the  notions  of  numerical 
v.ilue  scales  (utilities),  estimated  probabilities  (sometimes  called  subjective 
proliabi  1 i(  ies)  , and  the  rule,  select  the  acti  on  whicli  maxim,  izes  expected  ut  i 1 i ty . 

Ilie  stumliling  block  in  attempting  to  extend  tliese  notions  to  group  decisions 
is  the  existence  of  disagreement.  ff  all  the  memtuTS  of  a gioup  agree  on  the 
salient  features  of  a decision  problem,  no  special  difficulties  arise.  Hut  in 
almost  all  interesting,  decisions  in  practice,  members  of  the  group  can  differ 
widely  on  any  relevant  aspect  of  the  decision  problem. 

Tliere  are  two  generic  bases  for  disagreement:  uncert ai nty  (incomplete 

information)  and  conflicting  Interests . If  crucial  aspects  of  the  decision 
are  poorly  understood,  then  differences  of  opinion  are  pretty  much  inevitable. 
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In  ad<lition,  if  various  courses  of  action  open  to  the  group  lead  to  differen- 
tial rewards  to  members  of  tlie  group,  then  tlie  memliers  are  lil:ely  to  evaluate 
the  courses  of  .action  differently. 

I'hese  I wi/  generic  causes  of  disagreement  spawn  .a  witie  variety  of  rliscords. 
The  following  sliort  list  is  not  intended  to  he  comprehensive,  hut  to  |)inpoint 
some  of  tlie  more  critical  typos.  Appended  to  eacli  type  is  .a  Louclistone  (|uestion 
whicli  highlights  the  bone  of  contention. 

1.  Point  of  view.  ’.sfliat  is  the  problem? 

2.  I'actual.  Wiat  is  the  case? 

■J . Value . Iliiat  is  worthwhile? 

4.  Interest.  Who  gets  what? 

i’oint  of  view  differences  .are  the  hardest  to  characterize  and  tiie  most 
difficult  to  deal  witii  in  prc.ictico.  In  any  given  decision,  inil  iv1dii.il  s c.an  .ind 
do  have  quite  different  "models"  of  the  situation.  In  an  environmental  dispute, 
one  individu.al  can  take  an  ecological  point  of  view,  another  an  economic,  still 
another  a humanistic  or  aesthetic,  and  so  on.  These  differences  cannot  be  summed 
up  entirely  by  phrases  such  as  different  emphasis,  or  different  beliefs.  The 
different  points  of  view  are  different  ways  of  representing  tht  v/orld.  A cog,- 
nate  notion  is  problem  formulation.  Different  individuals  forr.,uiate  the  decision 
problem  (as  they  see  it)  in  categories  which  taken  together,  do  not  form  a 
coherent  structure. 

Some  of  tlie  issues  involved  in  point  of  view  disagreement  .ire  discussed  in 
Chapter  II,  Section  2 under  tlie  topic  universe  ol  discourse . I)i  I ferent  indi- 
viduals c.in,  in  el  led,  he  talking  in  different  lang.uages  if  they  houiul  the 
problem  differently.  For  example,  one  individual  can  maint.ain  that  a given 
possibility  is  irrelevant  to  siune  central  feature  of  tlu*  decision,  .iiul  another 
Individual  maintain  that  on  the  contrary  it  is  highly  relev.int,  and  both  he 

correct  within  their  respective  universes  of  discourse. 
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'U'thoils  ()l  iltMliii)',  wil’n  point  ol  view  difterem’Gs  .irc>  not  well  (ie''elupe(l 
■ It  llie  present  timi'.  Tlie  ancient  rule,  "First  ilelini.  vmir  terms,"  is  relatively 
leelili'.  Ii  essentially  assumes  .1  eomnion  nnlveriU'  ot  cliseoiirse.  Soi.ie  I i eluio  1 cipv 
exists  wliieli  Is  lielpinl  it  inciividnals  can  formnl.ate  their  individual  models 
explicitly.  Amont;  these  are  cluster  analysis  to  generate  common  sets  oi  cate- 
gories, anii  relevance  trees  to  allow  for  several  levels  of  aggregation.^ 

However,  tiie  rcc|uLsite  tlieory  to  apply  thes<>  techniques  to  point  of  view  dis- 
aj’reement  lias  not  been  generated,  and  more  to  tlie  point,  figures  of  merit  for 
evaluating  the  effectiveness  of  the  techniques  have  not  been  defined. 

In  this  report,  point  of  view  disagreement  will  usually  be  sidestepped  by 
the  .issumplion  th;it  the  group  has  already  agreed  on  a common  model  of  the  basic 
I. actors  in  the  decision.  The  formal  resolution  procedures  then  deal  with  the 
other  t'qies  of  disagreement  wliich  can  arise  within  the  common  model. 

Factual  disagreement  is  the  clearest  of  the  four,  and  the  type  for  w'hich 
methods  ot  resolution  are  most  advanced.  A major  fraction  of  this  report  vtili 
be  devoted  to  the  topic. 

Value  and  interest  differences  are  easier  discussed  together,  since  they 
are  often  confused.  In  the  terminology  of  decision  theory,  values  relate  to 
criteria,  objectives,  or  payoffs,  whereas  interests  relate  to  the  allocation 
or  dlsirihution  of  rev/ards.  It  is  possible  for  two  individuals  to  agree  com- 
pletely on  what  Is  worth  having,  and  disagree  completely  on  who  should  have 
it.  In  fact,  there  is  a significant  inverse  relationship  between  value  and 
interest  disagreement.  Tlie  greater  the  disagreement  on  values,  the  smaller  the 
disagreement  on  interests.  The  relationship  can  be  illustrated  by  the  old 
nursery  rhyme  about  lack  Sprat  and  his  wife,  .back  Sprat  could  eat  no  fat,  and 
his  wile  could  eat  no  lean— complete  disagreement  about  values.  As  a result, 
they  could  eat  the  platter  clean— no  conflict  of  interest. 
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M.iny  ti  is  cuss  ions  of  value  conflict,  especially  in  the  economic  lit<*rature, 
obscure  t!ie  distinction  between  these  two  types  of  disagreement  by  simplilyinp, 
tlie  motivational  component  of  decisions  to  a single  notion,  namely  preference. 

An  object  A is  considered  more  valuable  to  individual  i than  object  B if  [ 
prefers  A to  B (i.e.,  1 would  select  A over  B given  a free  choice).  However, 

preference  has  two  "dimensions",  value  per  se  and  amount.  An  individual  can 

prefer  item  A to  item  B for  cither  or  both  of  two  reasons,  item  A is  a more 
valuable  kind  of  item  than  B,  or  they  are  both  the  same  kind  of  item  and  !! 
includes  more. 

The  most  intense  form  of  interest  disagreement  occurs  i;hen  values  are 
identical,  but  there  is  a scarcity  of  rewards.  In  the  theory  of  games,  tlie 
sharpest  conflict  occurs  with  the  zero-sum  two-person  game  where  tlie  payoff  is 
equiyalent  for  eacli  player,  but  the  rules  of  the  j'ame  determine  tliat  whateyer 
one  player  p.ains  the  other  loses.  On  the  other  hand,  if  indiyidual  1 diu-s  not 
want  what  individual  I wants,  and  vice  versa,  it  is  hard  to  start  a quarrel. 

The  status  of  value  judgments  is  somewhat  up  in  the  air  at  tlie  present  time 
Individuals  do  disagree  on  the  relative  worth  of  different  l.inds  of  rewards 

as  well  as  on  allocations.  However,  there  is  no  generally  accejitod  criterion 

of  correctness  of  value  judgments.  In  the  prevalent  view,  value  judgments  are 
"ad  lib". 

2.  Reso 1 ut ion  of  Disagreement 

In  practical  affairs,  there  are  a number  of  reasons  lor  avoiding,  or  resolv- 
ing d isagreei'K'n  t . Above  all,  of  course,  d 1 .sagreemi-nt  can  he  an  impediment  to 
action,  providing  action  requires  consent  on  the  part  ol  members  ot  the  group. 
But  in  addition,  disagreement  usually  entails  costs  in  delayed  action,  and  in 
abrasive  interaction.  It  can  lend  to  conflict,  ranging  from  "verbal  battles" 
to  more  violent  forms  of  confrontation.  A more  insidious  kind  of  cost  can 
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oil  11  r in  the  I oi  m ul  iliM’radfil  derisions.  Tlie  resolution  process,  especinliy 
II  it  involves  so-c,illed  compromises,  can  lead  to  1.ir)',e  |i  I ases  in  tlie  1 i na  I 
clio  i ci‘ . 

Ilisloi  ical  Iv,  a nnmher  ol  procedures  have  evolved  Lf>  di-al  with  d i sapreemen  L . 

A 

Somi'  ol  the  more  wideJv  practiced  are: 

1.  DictatoriaJ . One  individual  makes  tlie  decision. 

d.  Oh i ect ive . The  decision  is  made  according  to  preestablished  rules. 

1.  Darwinian . Tlie  decision  is  the  outcome  of  competition. 

4.  Col lect ive . The  decision  results  from  amalgamation  of  the  individual 
judgments. 

The  dictatorial  solution  is  by  far  the  most  common  way  of  resolving  dis- 
agreement . !t  occurs  not  only  in  tyrannies,  but  in  all  walks  of  life.  There 
is  nothing  necessarily  despotic  (i.e.,  arbitrary)  in  the  notion.  T>ie  industrial 
man.iger,  the  government  agency  head,  tlie  head  of  a liouseliold,  any  one  who  can 
claim  "linal  authority",  is  a device  to  resolve  disagreement.  In  practice,  the 
"one  man"  nature  of  the  procedure  is  obscured  by  complexity — e.g.,  the  hier- 
archical structure  of  large  management  staffs — and  by  the  constraints  which 
the  "system"  places  on  the  abuse  of  power. 

The  dictatorial  solution  has  more  than  historical  usage  to  justify  it. 

It  is  effective,  it  is  efficient,  and  it  has  the  advantage  that  it  is  free 
of  some  of  the  more  vexing  conceptual  issues  if  multi-person  procedures.  Because 
ol  these  strong  advantages,  it  is  likely  to  be  the  most  widely  used  method  of 
resolving  disagreement  for  some  time  to  come.  However,  the  dictatorial  solution 
has  a number  of  weaknesses  beyond  the  potential  abuse  of  power.  Above  all,  it 

A 

I have  omitted  from  this  list  the  more  violent  procedures  such  as  physical 
coercion,  and  the  more  blatantly  totalitarian  procedures  such  as  information 
control,  not  because  they  are  rare,  but  because  they  are  outside  the  scope  of 
the  present  treatment,  which  is  limited  to  cooperative  decisions. 
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is  subject  to  the  biases,  limited  point  of  view,  and  other  patholoj>ies  of 


individual  judgment. 

Objective  resolution  of  disagreement  takes  a number  of  forms.  Perhaps 

the  most  relevant  to  the  present  discussion  is  the  form  exemplified  by 

institutionalized  science.  The  scientific  community  has  developed  a set  of 

criteria  to  settle  factual  issues.  An  essential  element  of  these  criteria  is 
* 

objectivity.  The  rules  can  be  expressed  in  terms  that  do  not  refer  to  the 
individual  researcher,  i.e.,  in  terms  of  data,  and  inferences  from  data.  liere 
may  be  some  controversy  concerning  the  precision  of  the  criteria,  especiallv 
with  regard  to  Inferences.  Rut  the  contrast  tietween  the  relatively  oi)jectivc, 
rule-prescribed  procedures  of  the  natural  sciences  and  the  fuzzier  procedures 
in  other  social  domains  is  clear. 

Perhaps  the  most  salient  feature  of  science  from  the  present  perspective 
is  its  extraordinary  success.  By  relegating  pure  debate  and  personal  influence 
to  tlie  background  in  settling  factual  disputes  it  has  exhibited  a power  to 
solidify  knot/ledge  in  a way  that  is  well  beyond  reasonable  doubt.  Kor  this 
reason,  there  appears  to  be  little  question  that  scientific  knowledge  is  the 
most  excellent  kind  of  information  that  can  be  input  into  a decision — when  it 
is  available. 

There  is  only  one  serious  wealuiess  of  the  scientific  method  tor  most 
decisions;  namely,  it  is  incomplete.  If  a firm  scientific  basis  can  be  found 
for  an  assertion,  it  is  a valuable  input  to  any  decision  for  which  it  is 

Some  scientists  would  claim  tliat  intersuhjectivity  is  all  that  is  required. 
Whether  int  erstih  ject  ivi  ty  can  be  achieved  without  objective  r('ference  point' 
is  a moot  subject. 

The  only  dubiety  liere  is  a matter  of  relative  solidity.  Much  ol  tlie  "l'.nou-lio\>" 
of  technology  also  has  a high-order  of  credence,  even  thoug.h  it  does  not  liave 
the  overt  validation  structure  of  systematic  science.  ’Hie  tecltnol  ogj  s t s test 
"hoes  it  work?"  appears  to  be  about  as  powerful  as  tlie  scientists'  "Is  it  sub- 
stantiated liy  experiment?"  in  weeding,  out  I’.roundless  beliefs. 


I f h' v.in  I . 


•'■nl  H I Mini  ,'u- i ■ ‘nt  i t i c basis  is  not  avn  il  .ili  I c tiicn  file  si  .ilcim'nt 


rom.iins  in  m ioiu  i I lo  llinlio;  it  is  siinpJv  unprovt<i.  F('r  most  Into  res  t ’ ii” 
ilt'O  i s ions , I proportinn  ol  l.irlual  issues  are  in  lit.-  's  ier,rifir  limbo. 

i'.c  ient  isl.s  liave  iriposed  .inotlier  ineomplet  enes.s  on  tlu-ir  mel'io(i,  naiicly 
I be  ('ontenLion  tliat  srience  can  say  notiiiii)',  aiicuit  values.  Titere  aptiears  to  be 
m uti'/e  (iebate  Itep.inninj:;  on  this  subject  within  ra>i:ie  scientific  comnuni  ties . 
For  tlie  time  beinp,,  liowevcr,  tliere  are  no  value  Judgments  tiiat  can  claim  the 
"official"  sanction  of  scientific  method. 

Tiie  term  Oarvinian  refers  to  a wide  v"rlety  of  types  of  disagreement 
resolution  tlicat  involve  competition.  I’erliaps  Lite  purest  example  is  the  debate, 
wliere  two  individuals  present  as  pot;erful  array  of  arguments  for  and  against 
a given  statement  as  they  can,  and  "the  best  man  wins".  'lie  typical  formal 
debate  repuiri's  a judge  (or  Judges)  v;ho  is,  in  effect,  the  agency  of  resolu- 
tion. Ill  more  general  settings,  expressed  by  phrases  such  as  t!ie  marketplace 
of  idea;,  tiie  intellectual  forum,  and  the  like,  the  role  of  Judge  is  prestimably 
talen  liy  a loosely  defined  interested  community. 

For  more  narrowly  defined  group  decisions,  tlie  group  itself  may  be  the 
jiiflg.e,  anti  nay  also  include  tlie  contenders.  'lesoliition  is  by  "consensus"  a 
somewhat  vaguely  defined  process  including,  usually,  face-to-face  discussion, 
various  forms  of  mutual  piersuasion,  and  other  influences  wliicli  may  lead  to 
ag.reemcnt . 

1 have  labeled  tliis  fuzzy  class  of  procedures  ')an/inian  because  oi  the 
implied  assumption  that  the  competition  leads  to  "survival  of  the  f 1 1 test"— i . e . , 
that  the  most  excellent  judgment  is  the  one  that  wins  out.  Tills  assumption 
appears  to  ho  more  an  article  of  faith  than  the  result  of  careful  evaluation. 
Tliere  are  serious  i>rohlems  in  designing  cxiiertments  to  evaluate  the  effecLive- 
lu-ss  of  competitive  processes  for  selecting  the  best  (e.g.,  the  most  accurate) 
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i u(l)’iiient  out  of  a list  of  contenders.  Mevertlieless  the  question  whether  competi- 
tive processes  are  effective  is  an  empirical  one.  My  own  attitude,  i'ased  on  a 

2 

lev/  experiments  of  mv  own^  and  after  surveying  the  rather  sparse  literature  on  the 
subject  is  that  competitive  procedures  arc  prohably  lietter  than  dictatorial 
ones,  at  least  on  the.  average.  However,  if  competition  is  viewed  as  a filterinp 
process,  then  my  impression  is  that  the  efficiency  of  filtering  per  stage  is 
r.atiier  low. 

The  first  tliree  methods  of  resolution  arc  roughly  ways  of  selectini',  oiu 
judgment  out  of  a group  of  judgments.  One  somewhat  vague  rationaie  that  lan  he 
forwarded  on  their  behalf  is  that,  given  disagreement,  there  is  one  judgment 
that  is  correct,  and  the  otb.ers  wrong;  or  somewhat  weab.er,  there  is  one  judg- 
ment which  is  better  titan  the  others,  and  the  goal  is  to  find  that  jud;ynent. 

Ttu'  fourth  method  has  a different  rationale.  It  starts  from  the  assumption 
that  if  there  is  major  disagreement,  especially  wit'ain  a bnowledgeable  group, 
then  in  all  likelihood,  none  of  the  members  of  tlie  group  knows  the  answer  to  the 
question.  In  such  a case,  rather  than  selecting,  a single  answer,  more  can  be 
gained  by  amalgamating  all  tlie  answers— lienee  the  term  collective. 

’’ethodfi  of  amalgamating,  individual  judgments  are  in  an  early  staj;e  of 
development.  Procetlures  which  are  Implementablc  in  practice  come  down  to  some 
form  of  measure  of  central  tendency  (mean,  median,  geometric  mean,  etc.)  with  a 
measure  of  dispersion  (standard  deviation,  interquartile  range,  etc.)  to  indi- 
cate the  degree  of  disag.reement.  However,  in  theory  at  least,  more  sopiiist  icated 
methods  of  pooling  individual  judgments  are  possible,  and  are  discussed  in 
Chapter  V. 

ftosi  of  the  results  whlidi  form  the  liody  of  this  report  arc*  presented 
within  till'  framework  of  the  collective  approach  to  disagreement  resolution. 
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i . 'H IP on  P i~  < nc.  i !•>  I e 

■Pif  i on  problem  can  I>o  expressed  t'o  ri;.  i 1 Ly  as  follows:  tiere  i;;  ,t 

r.roup  of  indiviiluals  v'lui,  on  .1  p.ivon  snbiect,  have  a set  of  Judp.nenL;  .O  where 
till'  index  refers  to  individual  1.  To  obtain  a proup  judp.ment  on  t!ie  sane 
topic,  tfiere  is  a function  l’(d),  I = (.1^  , . ,1^^)  , wtilch  apprepatcs  the  set  o! 
n indix'idual  iudpments  into  a slnple  proup  iudpncnt.  tiic  function  F shoultl 
full  ill  some  straightforward  coTtdltions: 

1.  SubstaiTtive  Fond  it  1 ons . F(,l)  should  be  the  sane  sort  of  jiidgnent  .is 

tlie  ,1^.  I'xample:  If  tlie  .are  probabilities  for  a given  event,  then 

F(l)  should  be  a probability. 

2.  r.onsistoncy  Conditions.  Consistency  here  refers  to  coherence 
between  the  individual  judgments  and  the  group  judgment.  Consistency 
■at  tlio  individual  or  group  level  is  part  of  the  substantive  condi- 
tions. example:  If  all  the  members  of  the  group  are  in  agreement, 

.1  j = ,Ij  for  all  i and  k,  then  F(J)  = ,1^  (the  unanimity  principle). 

i.  I’er  formance  Conditions.  If  there  is  a fig.ure  of  merit  for  the 

individual  judgments,  then  F(.I)  should  not  perform  poorly  v/ith  respect 
to  tills  figure  of  merit.  Fxample:  If  .K  is  indiviriiial  i's  answers 
on  a test,  and  each  individual  gets  a high  score  on  the  test,  then 
F(J)  should  not  get  a low  score. 

Conditions  of  type  3 have  not  received  a great  deal  of  attention  in  the 
literature  on  group  decisions,  primarily  because  tliose  of  typo  2 alre.ady  appear 
to  posf*  Insurmountable  difficulties.  Probably  the  best  knoivn  of  these  diffi- 
culties Is  the  result  derived  by  Kenneth  Arrow  tliat , if  the  are  Individual 
preference  relations,  there  is  no  F which  fulfills  a fe\;,  highly  plausible, 
consistency  condition*).  This  result  is  discussed  iu  some  detail  in  Chapter  VI. 
A similar  difficulty  is  exemplified  by  a result  1 demonstrated  some  time  ago 
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to  the  effect  that  if  the  .1.  are  probabilities,  then  no  F exists  which  fulfills 
the  usual  axioms  of  probability  for  both  the  individual  judgments  and  the  group 
judgment.  This  result  is  expounded  in  Cliapter  V,  Section  5. 

A basic  theme  of  the  collective  resolution  of  disagreement  is  that  condi- 
tions of  type  three  can  compensate  for  difficulties  with  conditions  of  type  two. 
Tile  idea  is  straightforward;  if  a group  judgment  can  be  shovm  to  perform  well 
on  a given  figure  of  merit,  then  a certain  amount  of  non-conformity  between 
individual  and  group  judgments  is  tolerable.  1 have  called  this  the  Tmcrson 

Principle — performance  is  at  least  as  Important  a criterion  for  ag.greg.it  ion  as 
* 

consistency . 

To  Invoke  the  Emerson  Principle,  it  is  necessary  to  have  a well-defined 
figure  of  merit  that  applies  both  to  individual  judgments  and  to  group  judgments. 
For  factual  judgments,  tliere  is  a large  family  of  figures  of  merit,  or  scores, 
which  enable  comparing  the  performance  of  individual  and  group  estimates.  This 
is  the  topic  of  Chapter  III.  Using  these  scores  it  is  possible  to  derive  a 
corresponding  family  of  n-heads  rules,  i.e.  , statements  to  the  effect  that  the 
group  score  is  better,  in  some  well-specified  sense,  than  the  corresponding 
individual  scores.  This  is  the  topic  of  Chapter  V.  The  n-heads  rules  appear 
to  be  a satisfactory  justification  for  using  group  estimates  in  decisions 


where  the  individual  members  disagree  on  factual  issues.  Tliis  "resolution"  of 
disagreement  is  a good  deal  stronger  than  simply  finding  a "compatible"  group 


Historically,  there  has  been  a wide  range  of  reactions  to  the  discovery  of 
inconsistencies,  from  panic  to  stubborn  unconcern.  The  story  has  it  that  tlie 
logician  Frege  died  of  a heart  attack  when  Bertrand  Russell  Informed  him  of 
the  paradox  of  the  class  of  all  classes  whlcli  do  not  contain  themselves. 

But  mathematicians  continued  to  use  the  notion  of  a differential  despite 
Bishop  Berkeley's  slashing  attack.  Zero  gradually  achieved  the  status  of  a 


full  fledged  numlier  even  tliough  contradictions  can  'e  derived  if  it  is 
"misused".  In  tlie  case  of  the  differential  and  zero,  the  concepts  \;crc  judged 
by  the  mathcm;itical  community  to  be  more  useful  tlian  dangerous. 
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jiitiKnimiL.  In  m<'y  t cis'’  the  croup  judcnen!  i.:  ‘■c‘'tor  tlinii  r!ie  f-vpir  il  indi- 
ciiUi.i!  jiuit’.ment;  and  in  tiu'orv  at  loasr,  tlip  r.ronp  'udp”.pnt  can  bt>  bettor  than 
I'll'  jiuli’Pcni  ol  inv  mopilior  ot  tlio  (’.roup. 

4 . Va  Ino  [I'l  orost  nisacreenent 

l.'lit'ii  wo  turn  rron  tactual  ostimateu  to  vai  io  iu'tMnonts  <'r  conflicts  of 
intcroit,  as  has  I'oon  noted  previously,  tliero  arc  no  a>>r€‘ec]  on  figures  of  merit 
v'liicb  apply  oiiuallv  to  individuals  and  to  tiie  >;r>iup.  ''’iius,  the  bmerson 
I’rinciple  cannot  be  used  to  sidestep  tlie  consistency  conditions.  As  it  turns 
out,  tliero  is  a fai  rlv  s t raigb  1 1 onward  resolution  o*^  inconsistencies  of  tlie 
Arrow  tvpe  wliicli  does  not  depend  on  performance  criteria.  if  individual  prefer- 
ences are  expressed  as  ordinal  scales — l.e.,  some  set  of  objects  is  selected  as 
a reference  set  and  preferences  for  other  objects  expressed  by  tiieir  location 
in  tile  .‘'.cale  formed  i'V  tins  reference  set — then  it  is  feasible  to  construct  a 
);rou|i  preference  -.cale  tliit  is  compatible  v/lth  the  individual  scales.  bemoii- 
strai  ion  of  this  possifiilitv  is  a principal  topic  of  f’liapter  Vi. 

Since  reference  objects  have  a number  of  desirable  features  in  themselves — 
liiey  assure  ttie  ‘ tability  of  individual  preferences,  and  form  the  bases  for 
extend inj’  preferences  to  more  numerical  kinds  of  measurement — introducing  tiieiii 
into  the  formal  apparatus  of  decisionmaking  appears  to  have  multiple  advantages 
beyond  simply  allowing  consistent  group  preference  scales. 

In  the  absence  of  a figure  of  merit  for  group  value  judgments,  it  is  not 
possible  to  asisert  that  the  group  will  be  "better  off"  if  it  uses  collective 
value  judgments.  There  is  a weaker  form  of  n-heads  rule  that  can  be  derived 
for  collective  value  judgments,  but  an  additional  notion  is  requrled,  namely 
the  notion  of  cooperative  decisions. 

It  is  useful  at  tills  point  to  have  some  additional  termlnologv.  An 
Individual  decision  can  be  analyzed  witli  the.  lielji  of  a decision  matrix. 
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illustrated  iti  Klgure  1.  There  is  n list  of  potential  actions  A = ) 

(.strategies,  ]>lans,  policies,  etc.)  among  wiiich  tlie  individual  can  clioose. 

There  are  two  properties  required  of  this  list;  (a)  each  action  must  l>e 
feasible,  i.e.,  the  imlividxial  must  be  able  to  carry  out  any  action  wiiich  he 
selects,  and  (b)  tlie  Individual  must  he  able  to  select  one  action  out  oi  the 
list— the  "free  will"  condition.  The  result  of  taking  a given  action  is 

dependent  upon  a set  of  contingencies  {E^}  = (K^ E^)  (states  of  the  world, 

uncontrolled  events,  etc.)  The  outcome  of  selecting  action  and  the  occurr- 
ence of  contingency  E is  designated  0,  . . The  set  of  contingencies  is  taken 

j It  J 

to  be  an  event  space  in  the  probability  sense,  i.e.,  tliero  Is  a probability 
distribution  l’(E.)  that  any  given  contingency  E^  will  occur,  wliere  these 
probabilities  do  not  depend  on  the  action  taken. 

''"o  complete  the  analysis,  it  is  assumed  that  there  is  a value  function 
(utility  or  payoff  function)  V(0  .)  which  defines  the  value  of  the  outcome 
O,  . to  the  individual. 

The  decision  rule  for  a decision  expressed  by  a decision  matrix  is  select 
the  action  A^  which  maximizes  the  expected  value  P.V(0j  .). 

In  the  individual  case,  the  value  function  V is  interpreted  as  Lh.e  value 
to  the  individual  of  a given  outcome,  and  the  probabilities  P(E .)  as  the 
probabilities  as  seen  by  the  individual. 

In  the  group  situation,  each  individual  has  his  ov/n  matrix — a set  of  actions 
that  he  can  take,  and  a set  of  contingencies  which  he  perceives  to  lie  relevant. 


*In  some  forms  of  decision  analysis,  the  set  of  actions  rmiy  he  extended  to  a 
tree,  i.e.,  a brandling  process  in  which  options  at  a later  stage  are 
dependent  on  what  lias  occurred  before.  Although  this  more  extensive  model 
has  a number  of  valuable  features,  the  critical  Issues  for  group  decisions 
can  be  discussed  using  the  simpler  matrix  description. 


:kH 


In  the  tree  version,  the  probabilities  of  events  can  depend  on  previous 
actions.  Again,  this  more  general  possibility  is  not  needed  for  most  of  tiie 
folloi;lng  discussion. 
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But  now,  in  addition  to  the  contingencies,  tlie  outcomes  are  deter  i.iiued  by  the 
actions  of  the  other  mein’oers  of  the  group.  The  situation  resemiiles  .a  game  in 

3 

the  sense  of  von  Neumann  and  Itorgenstern , hut  is  a little  more-  general.  In 
a von  Neumann  and  Morgenstern  game,  the  set  of  contingencies  and  the  outcome 
matrix  are  common  to  all  the  players. 

Since  in  this  disaggregated  case  there  is  no  common  value  function,  and  no 
common  set  of  proban ilities , there  is  no  direct  generalization  of  the  maximiza- 
tion rule  wii ich  defines  a group  decision  rule. 

A basic  simplification  of  the  analysis  is  obtained  if  attention  is  limited 
to  cooperative  decisions.  A cooperative  decision  is  defined  as  one  in  \ifliich 
tiie  group  is  committed  beforehand  to  selecting  a common  course  of  action.  in 
t!ie  terminology  of  game  tlieory,  the  group  selects  a coordinated  strategy, 
in  otlrer  words,  in  a cooperative  decision,  tiie  potential  individual  courses  of 
action  are  compiled  into  a single  list  of  potential  group  actions,  ’’’o  tliis 
extent,  then,  the  group  derision  is  simplified  to  something  more  rloselv 
resembling  an  individual  choice — i.e.,  the  choice  of  an  action  out  of  a single 
list  of  actions. 

Limiting  attention  to  cooperative  decisions  omits  a number  of  group 
processes  that  are  relevant  to  group  decision  analysis.  In  particular,  it 
slides  over  the  question  how  the  group  "decides"  to  take  a common  action. 

However  a broad  area  of  important  types  of  decisions  remains.  Typical  decisions 
encountered  in  business  firms  and  government  agencies  still  remain,  as  \jcll 
as  tliose  of  most  voluntary  organizations. 

The  notion  of  coopt>ratlon  defined  above  Is  ((ulte  narrow.  Note  that  anv  of 
the  four  resolution  techiu|lues  described  In  Section  2 can  operate  witliln  coopora 
tive  groups  as  <leflned.  i'lio  r»‘solutlon  procedures  relate  to  tlu?  way  In  whicli 
a given  course  of  action  is  selected;  the  cooperative  assumption  merely 


ilotormine.s  thnL  tlu'  selection  vvt  i 1 1 be  from  ;i  sinr.le,  coiranon  list.  riui-;,  n 
),roui)  cnn  comnit  itsell  to  a conunon  action,  and  still  "allow"  one  imlividual 
to  make  the  selection. 

In  the  dictatorial  "solution"  to  the  cooperative  decision  problen.,  the 
set  lit  cont  i npenc  i es , tlieir  probabilities,  and  the  outcomes,  are  the  te  per- 
reised  as  relevant  by  the  single  decisionmaker,  Slmiiarlv,  the  value  function 
is  one  tli.it  the  decisionmaker  finds  appropriate.  But  there  is  nothing  in 
the  dict.itorial  solution  which  says  that  the  value  function  reflect.s  the 
"sol  fish"  interests  of  the  decisionmaker;  for  most  organizational  managers, 
presumably  the  welfare  of  the  organization,  as  well  as  their  own  welfare,  would 
count  In  their  value  functions. 

An  even  more  drastic  simplification  is  connnonly  made  in  formal  treatments 
of  group  decisions,  namely,  that  the  entire  decision  matrix,  except  for  proba- 
bilities on  contingencies  and  the  value  function,  is  common  to  all  members  of 
the  g.roup.  The  assumption  sweeps  most  of  the  problems  associated  with  point  of 
view  dlsag.rf'ement  under  the  rug.  Tliere  is  no  good  justification  for  the  assump- 
tion, otiior  than  the  fact  that  It  bypasses  many  thorny  problems.  With  tliat 
unadorned  excuse,  the  assumption  of  a common  decision  matrix  will  bo  adopted  for 
most  of  tlie  formal  models  of  group  decision  in  this  report. 

The  two  assumptions  of  cooperative  actions  and  a common  decision  matrix 
lead  to  a g.reatly  simplified  arena  of  disagreement.  Disagreement  is  limited 
to  probabilities  for  contingencies  and  to  the  value  function.  Tlte  set  of 
actions  {A,,},  the  set  of  contingencies  {E  } and  the  outcome  matrix  | |o. . | | 

I.  .1  1 1 

are  identical  for  all  participants.  Eacli  individual,  however,  may  tiave  Ills 
own  set  of  estimates  f”"  probabilities  of  contingencies,  and  his 

own  value  function  outcomes. 
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The  n-heads  rules  mentioned  earlier  turnish  a basis  for  the  resolution  of 
disagreement  on  probabilities.  Ordinal  scales,  as  previously  noted,  allow  tlie 
lormulation  of  a consistent  group  preference  scale.  If  in  addition,  each 
individual  can  express  his  value  function  in  a numerical  form  which  is  linear 
in  prohahi 1 i t ies- technical ly  known  as  a utilitv  function— then  the  ste|)  to  a 
numerical  group  value  function  is  rather  small.  For  examiile,  if  it  is 
assumed  that  the  group,  whenever  It  is  indifferent  between  two  outcomes  A and  B, 
is  also  indifferent  between  A and  any  probability  combination  of  A and  B,  ilien 
the  group  value  function  is  just  a weighted  sum  of  the  individual  value  func- 
tions. By  a probability  combination  is  meant,  e.g. , a lottery  in  \diich  A will 
result  with  some  probability  p and  B will  result  with  probability  1-p.  The 
assumption  that  the  group  is  indifferent  between  an  outcome  A and  a probability 
combination  of  A and  any  equivalent  outcome  B can  be  called  the  equivalence 
condi tion . 

The  assumption  of  individual  utility  functions  is  one  that  has  been  ratlier 
generallv  accepted  by  decision  theorists.  The  equivalence  condition  for  group 
preferences  is  more  controversial.  However,  it  can  be  bolstered  by  a lorm  ol 
n-heads  rule.  The  weighted  sum  of  the  individual  value  functions  minimizes  tlie 
v/elghted  total  regret  of  the  members  of  the  group.  An  individual's  regret  is 
the  difference  between  what  he  expects  the  group  can  achieve  (in  terms  of  his 
value  function)  and  his  expectation  of  the  value  of  the  action  the  group 
selects.  'Die  total  group  regret  is  just  the  sum  of  the  individual  regrets. 

The  min  total  regret  result  is  rather  weak  since  there  is  no  separale 
criterion  to  indicate  that  the  group  is  "better  off"  if  it  adopi.s  a wi'ic.hied 


averaj;e  of  the  individual  utilities  as  a g.roup  value  lunctLon.  It  does  p.ive 
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n t rotv’.pr  j us  t i f 1 cnLl  on  fi)r  t'u'  we  ij'.iitfd  average  than  t!ie  ecjiii  valence  condi- 
* 

tion  alone. 

S . Some  I. i m i Lat  ions 

r\70  fornal  limitations  on  groiip  decisions  were  proposed  al)Ove  to  simplify 
pr<-)cedures  for  resolution  of  disagreement,  namely,  the  assumption  of  a common 
noint  of  vie'./,  and  the  assumption  of  cooperative,  i.e.,  coordinateil  actions. 

In  .addition,  there  are  some  caveats  concerning  group  judgment  that  are  diffi- 
cult to  characterize.  In  a completely  formal  fashion.  Strictly  speah.ing,  these 
caveats  are  not  part  of  formal  group  decision  analysis;  liov/ever,  they  are 
relevant  to  applying  group  decision  procedures  In  practice. 

'I-he,ids  rules  are  appropriate  where  there  Is  no  cost-effective  v/av  to 
obtain  a more  ohjectlve  answer  to  a decision-relevant  question.  If  the  answer 
to  a question  can  be  obtained  from  well-validated  sources,  or  by  relatively 
inexpensive  data  collection,  then  the  objective  answer  is  clearly  to  be  pre- 
ferred to  group  judgment.  This  consideration  does  not  create  any  conceptual 
problems  for  group  decisions;  in  theory,  at  least,  each  individual  should  pre- 
fer the  more  solid  objective  information  to  his  ov/n  judgment. 

There  is,  however,  another  circumstance  in  which  collective  iudgment  may 
not  be  appropriate  which  does  raise  conceptual  problems,  namely,  the  case  where 
the  grotip  knows  so  little  about  a question  that  a group  judg.ment  nay  be 
t "misleading".  This  case  is  closely  related  to  the  situation  variously  labelled 

in  the  literature  as  "radical  uncertainty",  "unkno\/n  factors",  or  "Incomplete 

In  the  demonstration  of  the  possibility  of  a consistent  group  preference  scale, 
and  the  demonstration  of  the  min  regret  result,  values  and  interests  are  not 
separated.  Kach  individual  is  assumed  to  have  either  a single  preference  scale, 
or  a single  utility  function,  which  expresses  both  values  and  Interests.  There 
is  some  reason  for  believing  that  separating  values  and  interests  can  simplify 
resolution  of  disagreement,  at  least  on  values,  and  possibly  on  interests  as 
v/cll.  Kxploitlng  this  possibility  requlrc.s  developing  a group  theory  for  multi- 
dimensional criteria.  This  topic  is  too  extensive  to  include  in  the  present 
report  and  will  be  treated  in  a separate  publication. 


17 


infomation" . For  reasons  \/lilch  are  still  oliscure,  if  an  individual  is  poorly 
inforned  on  a p,ivt>n  question  lie  is  likely  to  p,ive  a biassed  ans  ler.  bias,  in 
this  context,  does  not  imply  distortion  by  the  interests  of  the  individual, 
although  that  factor  may  play  a role,  but  only  implies  a r.vstematic  deviation 
from  the  true  answer.  Another  way  to  express  the  same  phenomenon  is  that  if 
the  individual  is  sufficiently  poorly  informed,  he  can  he  a counterpredictor ; 
if  he  asserts  A,  then  not-A  is  more  likely  to  be  true  than  A.  Under  tliese 
circumstances,  for  a yes-no  question,  flippinp  a fair  coin  will  p.ive  a more' 
accurate  ans'/er  (on  the  average)  than  the  indi  ’idual's  best  guess. 

Tlie  elementary  n-heads  rules  apply  to  any  set  of  judg, meats.  Thus,  no 
mattf-r  how  poor  the  individual  judgments,  the  error  ol  the  mean  will  always  bo 
less  tlian  or  equ.il  to  the  average  individual  error.  This  result  does  not 
guarantee,  however,  tliat  tiie  mean  is  free  from  bias.  The  group  can  be  a counter- 
nredictor  as  well  as  tlie  individual  members.  The  question  thus  arises  whether 
there  are  circumstances  under  which,  crudely  speaking,  the  group  would  be 
"better  off"  to  use  a random  device  to  obtain  an  answer  to  a decision-relevant 
question . 

This  topic  is  the  subject  of  Chapter  IV.  In  theory,  both  for  individual's 
and  groups,  there  are  questions  for  which  a better  ansv/er  can  l)o  obtained  by  some 
variation  on  random  choice.  I have  labelled  such  choices  nominal  es t imates , 
i.e.,  estimates  obtained  by  formula  rather  tlian  liy  judgmtnit.  ilie  practical 
problem  raised  by  the  possibility  tliat  nominal  estimates  can  perform  better  than 
jiidgment  is  to  find  an  indicator,  I.e.,  some  well-defined  method  of  identifying 
the  clrcumst.inces  under  which  nominal  estimates  are  called  for. 

Croup  judgment  is  to  some  extent  a guard  against  individual  bias.  In 
Chapter  V,  the  data  from  a study  of  probability  estimates  by  a professional 
group  is  analy7.ed  to  shew  that  Individual  judgments  can  perform  more  poorly 
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than  clianc-e;  wln-reas  t(u'  group  does  better  than  cliance.  Ttuis , any  indicator 
would  have  to  take  into  account  liie  fact  that  tie'  group  compensates  in  part 
tor  individual  error. 

Ilieri'  are  two  potential  indices  that  mig.lit  he  used  as  indicators  of 
questions  lor  which  tiie  group  is  a counterpredictor.  One  is  tiie  dispi^rsion, 
as  moas\ired,  e.g.,  by  tlie  standard  deviation  of  tiie  individual  responses.  The 
other  is  a self-rating,  i.e.,  an  appraisal  by  each  individual  of  the  degree 
of  confidence  he  has  in  liis  estimate.  In  experiments,  botli  the  standard  devi- 
ation and  the  average  of  individual  self-ratings  have  shown  high  correlations 
witli  tiie  accuracy  of  group  estimates,^  However,  there  are  difficult  problems 
of  calilirating  these  Indices  so  that  a given  standard  deviation  or  a given 
average  self-rating  will  indicate  tiie  same  degree  of  information  deficit  across 
a variety  of  types  of  questions. 

Chapter  IV  investigates  tlie  phenomenon  of  counterpredict ion  and  explores 
some  plausible  rules  for  generating  nominal  estimates.  For  one  body  of  data, 
a form  of  nominal  judgment  (uniform  weights  for  linear  estimation)  generates 
more  accurate  estimates  than  either  Individual  or  group  judgment.  To  this 
extent,  the  potential  value  of  nominal  estimates  in  cases  of  low  information 
can  be  illustrated.  But  the  material  in  Chapter  IV  does  not  resolve  the  issue 
of  specifying  an  indicator  of  counterprediction,  nor  does  it  give  a complete 
set  of  criteria  for  selecting  a nominal  rule  in  practice.  Chapter  IV  is  thus 
an  Initial  excursion  into  an  area  that  requires  additional  research, 
f) . r.ummvtry  Comments 

The  advantages  of  physical  cooperation  among  individuals  have  long  been 
apparent.  Tasks  which  are  impossible  for  individuals  to  perform  can  be  carried 
out  by  teams  of  Individuals.  When  specialization  and  division  of  labor  arc 
added  to  collective  effort  an  even  wider  range  of  beneficial  activities  becomes 
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lias  decided  to  adopt  a collective  decision  procedure,  the  mlninum  rep,ret  rule 
is  .1  mm-trivial  Incentive  for  choosing  the  weijd'led  .average  value  function. 

The  .iiiproacli  to  group  decision  el.aborated  in  t lie  chapters  vjhlch  lollow  Is 
necessarily  elementary  In  character  and  scope.  It  appears  llUely  that  more 
po\;erful  methods  of  aggregating  individual  judgments,  and  more  intuitively 
plausible  bases  for  generating  group  value  functions  will  be  uncovered  in  the 
near  future.  The  collective  approach  to  group  decisions  appears  to  offer  a 
fr.amework  in  which  such  improvements  can  be  meaningfully  pursued. 
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CIIAF’TER  11.  INDIVIDUAL  KSTJMATION 


I.  Ttip  Est  Inir'i t ioii  l’ro£t“_ss 

Altliouj’.h  llilrt  liook  Is  primarily  concerned  with  group  judgment.,  some  .iLLtn- 
t ion  must  he  g.lven  to  the  role  of  the  Individual.  Individual  judgments  are 
the  haair  ingredient.s  of  the  group  process.  For  decisions  involving  uncer- 
tainty and  disagreement,  the  quality  of  the  individual  judgments  is  a funda- 
mental limiting  factor  on  the  quality  of  the  group  decision.  A basic  theme  of 
the  book,  is  various  possibilities  for  using  the  group  process  to  improve  indi- 
vidual judgments.  But  roughly  speaking,  if  the  individual  judgments  are  poor, 
the  group  judgment  will  be  at  best  a little  less  poor.  The  old  adage  "You 
can't  make  a silk  purse  out  of  a sow's  ear"  is  about  as  applicable  to  improv- 
ing judgments  as  adages  normally  are  to  anything. 

In  this  Chapter  I will  be  examining  rather  elementary  kinds  of  individual 
judgments,  those  which  are  roughly  equivalent  to  simple  declarative  statements. 
In  addition,  the  discussion  will  be  restricted  to  factual  judgments.  There 
exists  a fair  amount  of  theory  and  some  relevant  experimental  data  concerning 
such  judgments.  In  the  following  exposition  1 have  been  highly  selective, 
dealing  only  with  those  approaches  which  I have  found  valuable  for  assessing 
group  decisions.  There  is  a much  richer  body  of  theory  and  experiment  dealing 
with  cognitive  psychology  that  is  in  some  sense  relevant.  Hopefully  someday 
all  of  that  will  help  Illuminate  the  stubborn  obscurities  that  plague  the 
topic. 

It  is  helpful  to  have  a crude  diagram  of  what  is  involved  in  making  an 
estimate.  A common  type  of  judgment  is  that  in  whlcli  the  individual  is  asked 
a specific,  numerical  question,  where  he  doesn't  know  the  answer,  but  can  make 
an  "educated  guess."  That  last  feature  Implies  that  the  individual  has  some 
relev.'int  information  "in  his  head,"  and  that  the  information  is  sufficient 
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to  generate  a properly  formatted  answer  to  the  question.  Consider  the  follow 


ing  example  which  1 cooked  up  to  provide  a vehicle  for  introspecting  about 
wliat  goes  on  in  imiking  an  estimate.  "Wliat  is  the  cost  of  a young,  well- 
trained  elephant,  FOB  Thailand?"  If  your  background  is  anything  like  mine, 
you  won't  know  the  answer  to  that  question;  and  yet  with  no  great  effort  you 
can  come  up  with  a number.  You  may  not  be  happy  with  the  number,  but  that's 
another  matter.  In  my  case,  the  number  that  came  into  my  head  was  $(),()00. 

A crude  diagram  is  suggestive. 


The  question  triggers  the  recall  of  related  Information.  The  nuiterial 
In  the  amorphous  box  has  been  labelled  "Information  and  other  stuff"  for  the 
obvious  reason  that  the  question  "brings  to  mind"  (at  least  to  mine.')  a <|uite 
amazing  variety  of  relevant  and  not  so  relevant  material.  Wlien  I thought  up 
the  elephant  question,  images  of  steaming  jungles,  turbanned  mahouts,  a scene 
from  the  movie  "Around  the  World  in  Eighty  Days,"  teak  logs  being  trunkled 
into  piles,  and  a great  deal  more  of  highly  colorful  mental  Imagery  flooded 
in.  Somehow,  all  of  that  added  up  to  $6,000. 

The  example  suggests  some  important  considerations  with  respect  to  the 
idea  of  using  Individual  judgment  as  a surn>j>atje  for  data.  Kven  lor  unceriai 


A phone  call  to  the  'Dial  consulate  In  Los  Angeles  elicited  an  estimati'  ol 
$5,000  as  a "round  figure." 


questions,  .1  nre.it  ile.il  of  miscellaneous  and  possibly  low  m.iterial 

exists  in  tile  minds  ol  suJt.ibLe  individuaJs.  liiu  noni'  ol  it  answers  Llie 
question  directly.  The  individual  is  needed  to  recall  triat  matirial,  and 
more  importantly,  to  fashion  it  into  a reply  to  the  question.  It  is  clear 
that  this  process  must  be  at  least  as  complex  as  diagrammed  in  Figure  d . 


PERCEIVE  AND  UNDERSTAND  THE 
QUESTION 

MEMORY  SEARCH  FOR  RELATED 
MATERIAL 


EVALUATE  RETREIVED  MATERIAL 

(a)  RELEVANCE 

(b)  SOLIDITY 


GENERATE  AN  ESTIMATE 


Figure  2 Baste  Processes  in  Estimation. 

The  first  step,  perceive  and  understand  the  question,  is  probably  a com- 
plex operation  by  Itself.  There  Is  some  reason  to  believe  that  in  order  to 
understand  a question,  the  individual  must  have  some  relevant  information.* 
Thus  understanding  probably  interacts  with  the  second  step,  retrieving  related 
material.  One  of  the  gaping  holes  in  the  theories  of  estimation  that  will  be 
elaborated  below  is  the  lack  of  any  substantial  treatment  of  these  first  two 
steps.  Tversky's  concept  of  anchoring  suggests  that  the  process  is  crucially 
affected  by  the  way  it  gets  started.^ 

As  in  the  elephant  example,  the  products  of  recall  need  evaluation  and 
screening.  There  appear  to  be  at  least  two  basic  criteria:  (a)  relevance  — 

is  the  material  useful  In  answering  the  question?  (h)  solidity— is  the 

■k 

id  . I lie  discussion  of  universe  ol  discourse  In  Set  i ion  2 tielow. 
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material  substantial,  or  is  it  "flimsy?"  T\u'  solidity  r.itin;’  lias  been  dealt 
with  in  the  literature,  for  example,  under  the  topic  "grounds  for  belief." 

A representive  list  of  pertinent  factors  might  be:  the  Individual’s  direct 

experiences,  congruence  with  other  attitudes,  consistency  with  the  beliefs 
of  acquaintances,  perceived  authority.  In  my  own  experience  all  of  this  is 
modulated  by  the  question  whether  the  recall  is  clear  and  strong  or  fuzzy  an' 
weak.  There  is  a host  of  Ph.D.  theses  waiting  to  be  generated  focussing  on  the 
identification  and  measurement  of  the  factors  affecting  the  solidity  rating. 

Several  indices  related  to  the  solidity  rating  will  be  examined  in  the 
sections  dealing  with  specific  models  of  estimation.  Those  include  proba- 
bility estimates,  self-ratings,  and  confidence  ranges.  Unfortunately,  most 
of  these  Indices  have  been  studied  only  for  the  final  judgment,  and  not  for 
the  "ingredients."  In  the  general  case,  where  the  judgment  is  not  part  of 
the  individual's  repertory,  but  must  be  made  up  for  the  occasion,  some  oi  the 
material  that  Is  retrieved  may  be  from  old  wives  tales,  from  fiction,  or 
from  one's  own  imagination.  (1  had  to  reluctantly  discard  the  scene  from 
"Around  the  World  in  Eighty  Days"  as  f iction- -re  1 uctanLly  it  w.t-.  the 

only  thing  that  came  to  mind  that  had  a relevant  number  in  it.)  Some  t>f  the 
rest  may  be  dubious,  and  hopefully  some  is  fairly  firm. 

I sometimes  refer  to  the  final  step,  the  generation  of  the  estimate,  as 
a minor  miracle.  Out  of  the  culled  heap  of  mostly  qualitative  matter,  an 
estimate  appears— In  the  elephant  case  a precise,  though  pretty  shaky  number. 
Equally  amazing  i.s  the  swiftness  with  whlcii  all  of  this  occurs  — a little  les- 
than  JO  .seconds  for  the  price  oi  the  elephant.  Figure  1 sin'w.'i  thr  results  oi 
a sequence  of  experiments  on  timed  estimates.  Several  groups  oi  upper  class 
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Figure  3.  Effect  of  Time  to  Respond 


and  graduate  students  were  asked  a series  of  general  information  questions 
not  unlike  the  elephant  question  (typical:  "How  many  girls  in  the  United 


States  under  the  age  of  nineteen  gave  their  status  as  divorced  In  the  1970 
census?")  They  were  given  various  time  intervals  in  which  to  read  and  respond 
to  the  question,  ranging  from  15  seconds  to  four  minutes.  For  many  of  the 
questions,  it  required  nearly  fifteen  seconds  just  to  read  the  question.  The 
graph  shows  error  as  a function  of  time  allowed.  There  is  a clear  minimum 
at  thirty  seconds.  Longer  time  spent  in  thinking  about  the  question  led  to 
less  accurate  answers.  The  data  for  four  minutes  is  not  shown  because  most 
of  the  students  found  they  could  not  think  productively  about  Lite  questions 
for  as  long  as  four  minutes. 

The  reentrant  arrows  in  Figure  2 are  to  suggest  that  tlnre  nav  he  leed- 
back  loops.  The  evaluations  step  may  initiate  further  search,  especially  if 
the  initial  material  is  discarded.  Even  the  estimation  step  may  go  all  tlie 
way  back  to  the  original  understanding  step  if  the  number  that  comes  to  mind 
is  "absurd." 

Figure  2 is  not  intended  to  be  a precise  description  of  tlie  estimation 
process.  It  presents  a crude  classification  of  some  of  the  basic  features. 
More  analytic  models  of  the  process  will  be  taken  up  below. 

2.  Estimation  Space 

Before  taking  up  theories  of  estimation,  it  is  usef:il  to  have  a lertain 
amount  of  notation.  Theoretically,  an  estimate  can  be  the  reply  to  any  ques- 
tion, and  can  range  from  a simple  "yes"  or  "no"  It)  the  equivalent  of  a book  — 
"What  do  you  think  the  world  will  be  like  in  the  year  2050?"  01  necessity, 

the  present  discussion  must  be  limited  to  questions  less  global  than  a world 
forecast.  To  the  extent  possible,  overt  replies  to  questions  will  be 
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design. iLed  by  the  letter  K (K  ft)r  "respon>o"  or  "rei'lv"),  with  various  snb- 
seripts  t('  indicate  wlio  is  responding  ;ind  about  what.  The  letti-r  tj  will 
designate  Llie  belief  of  an  individual  which  m-ay  or  may  not  be  the  .same  as 
R.  The  letter  1 will  designate  inform.it  ion,  nsuallv  the  infoiiaat  ion  on  which 
a beliet  (and  where  appropriate,  a reply)  is  based.  The  notation  (Q|i)  and 
(R|I)  will  designate  the  relationship  that  Q or  R is  based  on  I.  'ihis  rela- 
tionship is  not  well  defined,  a fact  that  will  be  occasionally  emb.irrass ing . 

At  times  the  relationship  will  be  treated  as  a relative  probability,  indicated 
by  P(Qjl)—  the  probability  of  Q given  1— but  usually  the  relationship  is  not 
that  of  a probability. 

In  addition  to  the  individual's  estimate  there  is  a true  answe'  to  the 
question.  Loosely,  we  say  the  individual  response  is  correct  if  it  i orre- 
sponds  to  the  true  answer.  In  general,  the  true  answer  will  be  designated  by 
variants  of  the  letter  T.  Thus,  if  the  response  is  a magnitude  estimate  like 
the  price  of  the  elephant,  then  T is  the  actual  price.  The  notation  becomes 
more  complex  (and  even  controversial)  for  some  types  of  estimates,  especially 
probability  judgments.  This  is  further  compounded  by  the  fact  that  for  the 
Interesting  cases  the  true  answer  is  unknown.  This  topic  will  be  elaborated 
in  Chapter  III. 

A question  implicitly  determines  a set  of  possibilities  concerning  the 
world  and  a .set  of  possible  responses.  These  can  sometimes  have  identical 
structures.  The  question  "will  it  rain  tomorrow?"  identifies  two  possible 
states  of  the  world— rain  or  no  rain— and  two  possible  replies,  equivalent 
to  "rain"  and  "no  rain."  However,  the  question  "What  is  the  probability  of 
rain  tomorrow?"  can  be  interpreted  in  two  ways.  On  one  interpretation,  there 
are  still  the  same  two  possibilities,  rain  or  no  rain,  but  an  Infinite  number 


29 


of  possible  replies  namely,  any  number  between  0 and  1.  On  t'ne  other  inter- 
pretation, the  infinite  set  of  potential  probabilities  are  possible  states 
of  the  world.  The  set  of  possibilities  concerning  the  world  will  he  referred 
to  by  the  slightly  fancy  term  "event  space,"  and  the  set  of  possible  replies 
by  the  term  "response  space."  The  event  space  will  be  designated  hy  E,  again 
with  subscripts  to  indicate  specific  events,  or  sometimes  by  U (universe  of 
discourse).  Tiie  response  space  will  be  designated  by  R. 

A basic  property  of  either  event  or  response  spaces  is  the  amount  of 
structure  tliat  is  imposed  on  the  space  (usually  by  definition)  prior  to  as'  ing 
a question.  The  simplest  structure  for  either  is  a list  of  miscellaneous 
possibilities.  Eor  examplts  the  question  "Who  will  be  the  next  president  of 
the  United  States"  may  refer  to  a list  of  several  names.  Clearly,  the  order 
of  the  list  doesn't  matter.  However,  even  such  an  elementary  set  of  possi- 
bilities will  be  expected  to  have  a minimum  of  structure;  namely,  the  items 
on  the  list  will  be  considered  separate  or  exclusive.  This  is  not  logically 
necessary,  of  course,  but  for  most  practical  situations,  it  would  be  awkward 
to  have  two  different  items  (two  different  labels  referring  to  the  same 
alternative) . 

The  structure  of  an  event  space  can  range  all  the  way  from  the  simple 
list  to  highly  complex  mathematical  frameworks,  e.g.,  the  input-output  coef- 
licients  of  the  world  economy  in  the  year  2C)2b. 

There  is  usually  a strong  coupling  between  the  event  space  and  the 
response  space.  They  may  have  the  same  structure,  or  if  not,  the  former 
sharply  delimits  the  latter.  One  of  the  thorny  issues  in  estimation  theory 
is  how  to  deal  with  the  real  liie  situation  where  tlie  coupling,  breaks  down. 

To  the  theorist  a reply  to  the  question  "Wliat  is  the  probabilitv  of  rain 
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tomorrow?”  1 iko  "Tlic  probability  of  rain  is  .7  and  the  probability  of  no  rain 
is  .b,”  won't  do.  It  is  lop.ically  iin;iccoptabio  iliat  the  probabilities  of  two 
p.\o  Ills  1 VC  (Vents  add  to  more  than  1.  Nt'vcrthe  less  , such  "inconsistent" 
responses  are  encountered  frequently  in  the  psychological  laboratory,  and  in 
councils  ot  industry  and  government— usually  not  in  sucli  a bare  form,  but 
olten  only  thinly  veiled.  Generally  I will  assume  that  the  logical  properties 
III  the  response  space  are  consistent  with  Lliose  of  the  event  space;  but  for 
.1  general  theory  it  is  necessary  to  allow  the  possibility  tliat  Lite  estimator 
does  not  know  (or  "slips  up  on")  the  structure  of  the  event  space. 

Another  relevant  aspect  of  event  spaces  relates  to  the  numerical  proper- 
ties of  the  structure.  It  is  often  convenient,  and  at  times  essential,  to 
quantify  the  event  space,  i.e.,  to  describe  the  set  of  possibilities  by  one 
or  more  scales.  Usually  this  is  done  as  a matter  of  course  where  the  question 
is  inherently  numerical.  But  in  many  instances,  quantification  has  additional 
value  within  the  context  of  group  decisions.  It  is  much  easier  to  aggregate 
numerical  judgments  t lian  purely  verbal  statements. 

The  topic  of  quantification  is  a whole  field  of  investigation  of  its 
own.  For  the  purposes  of  this  book,  the  central  issue  is  tlie  degree  to 
which  available  procedures  justify  the  mathemetical  operations  performed  on 
the  numbers.  A common  classification  of  the  possibilities  is: 

1.  Nomlruil  scjilj^:  A nominal  scale  consists  of  the  assignment  of  num- 
bers to  items  on  a (mathematically)  arbitrary  basis,  to  be  used  as  tags  or 
names.  For  example,  the  list  of  presidential  candidates  could  be  numbered 
"In  order"  and  thereafter  the  candidates  referred  to  as  "number  one"  etc. 

The  numbers  assigned  In  this  fashion  are  traditionally  assumed  to  liave  no 
mathematically  interesting  properties,  other  than  bc-ing  distinct.  1 am 
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inclined  Lo  ttiink  this  attitude  overlooks  some  ol  the  advantages  of  nominal 
scales.  K.g.,  a nominal  scale  can  furnish  counts  ("have  we  voted  on  all  t lie 
candidates?").  But  from  the  standpoint  of  <a  scale  of  measurement,  the  nominal 
assignment  of  numbers  is  not  equivalent  to  the  imputation  of  some  quantity 
to  the  items. 

2.  Pjdii^al^  scales . An  ordinal  scale  consists  of  a relation  which  puts 

the  items  in  a well-defined  sequence.  Typical  relations  are:  greater  than, 

better  than,  later  than,  more  costly  than,  etc.  Numbers  may  be  attailied  to 
the  items,  to  form  an  ordinal  scale.  If  N(x)  is  the  number  attaclu'd  to  item 
X and  N(y)  is  the  number  attached  to  y,  and  x has  the  given  relation  to  y , 
then  N(x)  > N(y).  In  general  this  is  the  only  restriction  on  the  numbers 
N(x),  so  that  any  other  set  of  numbers  fulfilling  the  condition  arc'  an 
"equivalent"  scale.  In  technical  terms,  the  number  assignment  is  determined 
only  up  to  a monotonic  transformation. 

3.  interval  scales . With  interval  scales,  the  differences  between  any 
two  numbers  are  ordered.  In  effect  the  ratio 

N(x)  - N(Y) 

N(z)  - N(w) 

for  any  two  pairs  of  items  (x,y)  and  (z,w)  is  fixed.  In  technical  terms  the 
scale  is  fixed  up  to  a linear  transformation;  i.e.,  if  N(x)  is  an  intc-rval 
scale,  than  AN(x)  + B,  where  A Is  a positive  constant  and  B is  any  constant 
is  an  equivalent  scale.  A typical  example  is  the  ordinary  scali-  ol  tempeia- 
ture  which  is  fixed  up  to  two  reference  points.  Various  scales  ol  temperature 
are  possible  depending  on  the  choice  of  reference  points.  The  freezing  point 
and  boiling  point  of  water  (at  sea  level)  is  the  common  choice  for  everyday 
scales.  Even  with  the  selection  of  reference  points,  there  is  still  the 
freedom  of  assigning  numbers  to  these.  We  have  two  common  scales,  the 


Fahronhfil  .iml  Celsius  scales.  Tlie  former  .•tssi>^n!;  - and  212“  tc'  the  two 
pciluts.  Tile*  latter  asslj’.ns  0“  and  100". 

4.  ratio  sc-ale  Is  one  which  Is  fixed  except  lor  one 
arbitrary  (multiplicative)  constant.  Most  common  physical  quantities  are 
of  this  sort— length,  weight,  time  duration,  etc.  Various  alternative  sets 
of  scale.s— Kngiish,  metric,  etc.  -are  interchangeable  by  multiplication  by 
suitable  constants.  In  the  case  of  ratio  scales,  only  one  reference  object 
is  necessary  to  fix  the  scale— the  0 comes  for  free.  In  technical  terms, 
there  Is  a fixed,  absolute  zero. 

5.  Absolute  scales . An  absolute  * scale  Is  one  for  which  there  is  no 

f ri'edom  whatsoever;  the  scale  is  completely  fixed.  The  best  known  scale  of 
this  sort  is  tlie  scale  of  cardinal  numbers — those  used  for  counting,  tallying 
and  so  on.  Another  absolute  scale  is  probability.  The  reference  points,  0 
and  1,  are  fixed.  This  feature  of  probability  plays  an  important  role  in 
trying  to  tie  the  theory  of  subjective  probability  to  the  theory  of  objective 
probability  measures. 

In  addition  to  these  5 typical  kinds  of  scales,  there  are  many  variants. 

In  the  discussion  of  group  utilities,  the  kind  of  scale  obtained  by  introducing 
retcr»*nre  points  for  ordinal  scales  will  play  an  important  role  in  resolving 
inconsistencies  between  individual  utility  judgments.  Psychologists  and 
social  scientists  frequently  use  category  scales,  l.e.,  a sequence  of  "soft" 
reference  points  specified  by  verbal  descriptions  such  as  very  desirab 1 e , 
desirably,  neutr^,  undesirable,  very  undesirable.  Numbers  may  be  attached 
to  these  categories,  very  desirable  = 5,  very  undesirable  = 1,  etc.  Wliether 
manipulating  these  numbers  in  usual  arithmetic  fashion— e.g. , taking  averages 
or  standard  deviations  — is  justifiable  depends  on  properties  of  the  numbers 
that  are  rarely  tested  in  practice. 
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One  of  the  guiding  principles  of  the  application  of  group  judgment  to 
uncertain  problem  areas  is  the  availability  of  a remarkably  rich  assortment 
of  judgmental  scales.  Humans  have  a rather  astonishing  ability  to  quantify 
practically  any  aspect  of  a problem,  at  least  in  a rough  "intuitive"  way. 
However,  whether  the  numbers  generated  by  human  judgment  have  the  nece.ssary 
properties  to  justify  treating  them  as  matheimitical  entitles  requires  demon- 
stration. This  is  just  as  true  where  the  individual  is  trying  to  estimate  a 
well  known  physical  quantity,  such  as  length,  or  time  duration,  as  it  is  wnen 
the  quantity  is  "subjective,"  such  as  desirability.  The  psychological  magni- 
tude defined  by  tVie  judgments  may  not  have  the  same  mathematical  properties 
as  the  physical  magnitude  being  estimated.  We  will  encounter  a situation  of 
this  sort  in  the  psychonumeric  phenomenon  discussed  below. 

One  final  topic  needs  to  be  examined  before  leaving  the  discussion  of 
estimation  spaces,  namely,  the  identification  of  a universe  of  discourse. 

This  topic  is  full  of  obscurities  and  quasi-paradoxes,  and  I wouldn't  bring 
it  up  at  all  if  it  were  not  one  of  the  crucial  aspects  ol  increasing  tlie 
solidity  of  judgment.  The  question  is,  how  are  the  boundaries  of  the  estima- 
tion space  del Imlted  — how  can  one  specify  what  is  to  be  included  and  what  is 
to  be  left  out?  One  appealing  point  of  view  (at  first  glance)  is  the 
straightforward  proposal:  Why  leave  anything  out?  Why  not  specify  the  ele- 

ments of  the  problem  you  are  interested  in,  and  then  swet p everything  else 
into  a "throw  away"  category  "everything  not  included  in  the  above"?  In  this 
way  you  don’t  clutter  up  your  event  space  with  every  possible  state  of  every 
possible  universe,  but  at  the  same  time,  you  have  at  least  a we.ik  guard 
against  omitting  a crucial  feature  that  is  not  initially  ap|>arent. 

Unfortunately,  this  ingenuous  suggestion  runs  into  a host  of  difficulties 
when  tlie  event  space  is  uncertain.  One  type  ol  trouble  is  illustrated  by  the 
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well-known  paradox  of  con  I i rnuit ion , as  formulated  by  Hemple. 
statement  sucb  as  "All  ravens  are  black"  is  logi'ally  equivalent  to  the 
contra-positive  "All  non-black  things  are  non-ravens."  This  creates  no  dif- 
ficulties as  long  as  we  are  dealing  with  a tidy  universe  of  true-false  asser- 
tions. However,  if  we  are  concerned  with  the  messier  world  of  incomplete 
information,  and  examine  the  equivalence  of  these  two  with  respect  to  con- 
firming evidence,  an  embarrassing  situation  arises.  According  to  well- 
established  custom,  anything  that  is  both  a raven  and  black  confirms  the 
direct  tormuialion  of  the  general  statement.  But  by  the  same  custom,  any- 
thing  t h;U  is  both  a non-raven  and  non-black  confirms  the  second,  and  hence 
by  logic,  conlirms  the  first.  Thus,  if  we  go  down  to  the  beach,  each  brown 
grain  of  sand  is  a confirming  Instance  for  the  assertion  "All  ravens  are 
black."  We  have  a ready  made  store  of  billions  of  confirming  instancesi 

There  doesn't  seem  to  be  mucli  doubt  that  the  pathology  here  is  related 
to  tlie  universe  of  discourse  for  the  sentence  in  the  doubtful  mood.  Extend- 
ing the  universe  of  discourse  to  Include  a beachful  of  sand  results  in  allow- 
ing lion-relevant  cases.  In  fact,  at  first  blush,  something  which  is  neither 
a r.iven,  nor  black  seems  to  be  beside  the  point,  and  you  might  want  to  con- 
t rai-t  the  universe  of  discourse  to  Include  only  those  things  which  are  either 
ravens  or  black.  You  are  forced  by  logic  to  do  something  drastic,  because 
the  negation  sweeps  in  the  whole  remainder  of  the  universe.  However,  that 
won't  do  either. 

To  see  this,  we  have  to  generalize  the  problem  slightly.  Suppose  you  are 
a psychology  graduate  student,  doing  research  for  your  Ph.D. , and  you  want  to 
do  experiments  to  establish  (or,  heaven  forbid,  reject)  an  hypothesis  you  have 
conjectured  to  the  effect  that  a given  procedure  A will  Influence  positively 
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the  performance  ot  a task,  B.  You  divide  people  into  two  sorts,  those  that 
Iiave  received  treatment  A and  tliose  who  haven't,  and  redivide  them  into  those 
who  exceed  a given  criterion  on  task  B (call  them  B+) , and  those  who  don't, 
call  tliem  B-.  You  recruit  a group  of  students  from  the  department  subject 
pool  (students  who  earn  departmental  brownie  points  by  volunteering  lor  duty 
as  subjects  in  experiments)  and  proceed  to  expose  them  to  treatment  A and 
count  the  number  of  B+'s  and  B-'s  so  produced.  You  write  it  up,  with  fot 
one  degree  of  freedom  duly  noted,  and  your  lab  advisor  hits  the  (cillng.  Whv? 
You  didn't  have  a control  group. 

The  situation  can  be  diagrammed  by  the  matrix 

B+  B- 
A x y 

not  z w 

A 

You  obtained  results  for  the  first  row,  x and  y.  But  nothing  In  tliat  experi- 
ment guarantees  that  tlie  proportion  x/y  is  not  precisely  the  proportion  you 
would  have  observed  if  the  subjects  had  not  been  exposed  to  your  treatment; 
perhaps  x/y  = z/w,  in  which  case,  the  treatment  has  no  effect.  But  notice, 
if  you  now  dutifully  go  back  to  the  lab  and  run  a control  group  (no  treat- 
ment) and  count  z and  w,  you  are  doing  exactly  what  seemed  so  trange  for 
the  ravens  and  blackness.  Your  hypothesis  is  that  A has  a positive  effect 
on  B+;  if  there  are  no  non-A's  that  are  B-,  your  hypothesis  is  in  trouble. 

Tlie  apparent  difference  between  the  two  cases  stems  from  an  illusion 
lnduc(id  by  the  universal  form  ot  the  statement  "All  ravens  are  black."  If 
you  restrict  your  universe  to  ravens,  then  your  assertion  is  equivalent  lo 
"Kverything  is  hlack."  If  you  allow  some  non-ravens,  then  there  had  better 
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bi“  sonu'  non-ravens  which  arc  non-black,  or  the  assertion  Is  still  "Kverything 
is  black." 

Tlie  embarrassment  1 mentioned  is  just  that  there  are  no  rules  that  anyone 
has  bei'n  alile  to  tliink  of  tliat  do  not  outrage  logic  and  at  the  same  time 
ixclnde  the  silly  consequence  of  accepting  a grain  of  sand  as  a confirming 
instance  tor  "All  ravens  are  black."  Actually,  the  Ph.D.  stude.nt  is  saved 
from  thinking  this  through  by  the  fact  that  the  psych  pool  is  a well  defined 
universe.  That  is  one  reason  that  many  "hard-headed"  people  in  the  "real 
world"  are  dubious  of  the  "transfer  of  laboratory  findings." 

The  paradox  of  confirmation  is  perhaps  the  most  dramatic  pathology  con- 
nected with  selecting  a universe  of  discourse,  but  there  are  others.  If  you 

look  into  logic  texts,  the  requirements  for  a specified  universe  of  discourse 
3 

is  usually  stated.  In  some,  the  possibility  of  getting  into  trouble  if  the 
universe  is  Interpreted  too  generously  is  given  lip  service,  and  is  then 
promptly  dropped.  You  will  see  illustrations  of  the  Venn  diagram  for  a logic 
problem  that  look  like  Figure  4 where  the  square  box  is  the  universe,  and  the 
ovals  are  the  classes  of  things  of  interest.  Illustrated  is  the  statement, 
"All  ravens  are  black" — the  oval  representing  ravens  is  Included  in  the  oval 
representing  black  things.  Also  illustrated  is  the  statement  "Some  swans  are 
black." 

UNIVI  MSt 


Figure  4.  Normal  Universe  of  Discourse. 


37 


Interest  in^jly,  you  never  see  a dla>>ram  tliat  looks  like  Figure  5 which 
shows  what  is  going  on  tmtside  the  neat  world  you  have  enclosed.  You  never 
meet  statements  like  "All  ravens  in  my  world  are  black  (but  some  outside 
are  not ) . " 

What  the  logic  texts  never  deal  with  is  what  happens  if  you  change  the 
boundaries,  if  you  expand  or  contract  your  universe.  In  pure  logic  that  is 
not  an  exciting  question,  especially  if  you  are  careful  to  keep  all  the 
classes  you  are  interested  in  well  inside  the  boundaries.  Relations  defin- 
able in  pure  logic,  such  as  class  inclusion,  overlap,  exclusion,  and  the  like', 
remain  invariant  under  changes  of  the  boundary.  This  is  quite,  different  it 
the  universe  is  uncertain,  and  you  are  concerned  with  things  like  confirma- 
tion or  probability. 

Ordinarily,  the  probability  of  the  universe  is  defined  as  1.  The  Venn 
diagram  is  a useful  device  to  illustrate  that  the  probability  of  a cilass  can 
be  represented  by  the  ratio  ot  its  area  to  the  area  of  the  universe.  Suppose 
the  universe  is  the  class  of  people  living  in  the  United  States.  The  proba- 
bility that  someone  living  in  the  United  States  has  a Ph.D.  is  rather  small, 
which  ran  be  represented  by  a tiny  area.  The  probability  that  someone  in  the 
United  States  is  female  can  be  boldly  represented  by  a line  cutting  the  uni- 
verse in  half.  Obviously,  if  you  expand  the  universe  to  the  population  of 
the  world  the  probabilities  of  some  classes  are  going  to  <hange,  e.g.,  the 
probability  of  making  an  income  over  $10,000  a year. 

Wliat  is  much  more  interesting  is  that  relations  between  classes  which 
are  thought  to  be  fundamental  in  probability  theory  can  change  quite  drasti- 
cally. One  of  the  basic  notions  in  applied  probability  theory  is  iiide£^ni|enc_e 
and  its  side-kick  deji^iidence . A property  A is  said  to  be  Indepi-ndent  of  a 
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property  ii  if  tlie  probability  of  A,  given  that  B obtains  is  tlie  same  as  the 
probability  of  A in  general.  Or,  ecjuivalent  1 y , the  two  a’"c  independent  if 
the  proliability  of  the  conjunction  A and  B is  equal  to  the  product  of  the 
probabilities  of  the  two  separately. 

Consider  a cl.issic  type  of  example,  an  urn  in  which  there  is  ;i  certain 
number,  say  100,  of  poker  chips.  Suppose  there  are  two  shapes,  round  and 
square,  and  two  colors,  green  and  blue.  The  chips  are  designed  so  50  are 
round  and  50  are  square.  Similarly,  50  are  green  and  50  are  blue,  and  the 
shapes  and  colors  are  mixed  so  that  25  of  the  round  chips  are  green  .and  25 
of  the  square  chips  are  green.  Thus  the  probability  of  drawing  any  of  tlie 
conbi nat i ons- square  and  blue,  e.g.,— is  precisely  one  quarter,  and  the  con- 
ditions for  Independence  are  met;  the  probability  of  drawing  a stjuare  blue 
chip  “ 4 ^ ~ probability  of  drawing  a square  cliip  probability  of 

drawing  a blue  chip. 

Now  suppose  we  dilute  the  chips  in  the  urn  with  an  additional  100  oval 

■k 

white  chips.  This  corresponds  to  expanding  the  universe  of  discourse.  The 

relative  proportions  of  round  and  square  and  green  and  bine  chips  has  not 

changed  (technically,  the  relative  probabilities  of  these  characterist ics  has 

not  changed),  but  the  probability  of  drawing  a round  chip  has  been  cut  in 

half,  and  the  probability  of  drawing  a green  chip  has  also  1 n halved.  In 

addition  the  probahifity  ol  the  combination,  round  and  green  h.is  also  been 

cut  in  half.  Thus  the  [iroh.ib  i 1 i t y of  drawing,  ;i  square  blue  chip  = i ^ liroha- 

n 

biliiy  oi  a square  chip  ” probability  ol  a blue  chip  = ^ . The 

characteristics  are  no  long.cr  inde|>endent  . 

k 

The  same  eflect  could  be  achieved  by  leaving  the  original  urn  unchanged, 

•ind  Introducing  a second  urn  containing  100  oval  white  chips,  then  I lipping 
a coin  to  determine  which  urn  would  In*  drawn  1 rom. 


riiis  ex;impli'  niav  tin-i  art  i f ir  i.il  to  soim.'one  who  is  arcii.s  t omod  to  urn 
drawiii)’,  i I lust  rar  inns  ot  prohahility  relations.  liut  a little  rel'leetion  on 
Kip.ure  '>  should  niiike  it  clear  tli.it  the  sec-tniii}’  i r t i l i c i .1 1 i t v aer:;  I roni  pos- 
ing; a (piestion  which  is  outside  standard  lo^ic.  and  not  from  some  insiRlu  stem- 
ming; from  the  principles  ot  logic. 

There  are  somi'  rules  which  have  been  established  by  logicians.  These  arc 
rules  wliich  attempt  to  exclude  the  most  serious  form  of  logical  pathology, 
n.imely  .intinomles  or  logical  contradictions.  Thus,  potential  contradictions 
.issociati'd  with  the  words  "true"  and  "false"  have  led  logic  ians  to  partition 
the  universe  of  discourse  into  different  levels  of  language,  each  ol  which  can 
refer  only  to  languages  below  it.  Similarly,  contradictions  associated  with 
a too-liberal  usage  of  the  notion  class , have  been  excluded  by  a variety  of 
restrictions  such  as  partitioning  classes  into  a hierarchy  (theory  of  types) 
where  a class  can  include  only  classes  immediately  below  it  in  the  hierarchy, 
or  restricting  classes  to  those  that  can  be  defined  in  a specific  way  starting 
from  a fixed  set  of  Initial  classes,  and  the  like. 

These  grand  logical  restrictions  are  well  to  keep  in  mind,  but  usually 
they  are  not  serious  bugaboos  for  the  practitioner.  Not  many  practical 


It  you  .allow  statements  of  the  form  "This  sentence  is  false,"  where  the 
"this"  refers  to  the  sentence  in  quotes,  then,  it  you  assume  the  sentence 
is  true,  since  it  says  it  is  false,  it  must  be  false.  Conversely,  if  you 
assume  it  is  f.ilse,  it  says  it  is  false,  and  therefore  must  be  true.  The 
contradiction  Involving  classes  has  the  same  sort  of  self  reference.  Sup- 
pose, following  Bertrand  Russell,  you  define  the  number  three  as  the  class 
of  all  classes  that  have  three  members.  Now,  there  are  certainly  more  than 
three  classes  which  Itave  three  members,  so  the  number  three  does  not  belong 
to  the  class  of  things  with  three  members.  Hence,  the  number  three  is  a 
class  which  does  not  contain  itself.  Now  contemplate  the  class  of  all  such 
classes,  namely  tiie  class  of  all  classes  which  do  not  contain  themselves, 
and  call  It  A.  Does  A contain  itself?  If  it  does,  by  definition  it  doesn't; 
but  if  It  doesn't,  then  by  definition  it  does. 
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liocisioiis  involve  investigating  the  consequences  for  policy  of  the  world 
belonging  to  a class  that  doesn't  include  itself.  But  the  type  ol  puzzle 
represented  by  the  paradox  of  induction,  or  the  relativity  of  t 1k‘  not  iiiii  of 
independence  to  tfie  selected  universe  of  di  course,  are  precisely  the  .sort 
of  thing  that  can  bedevil  practical  derisions. 

The  notion  of  universe  of  discourse  has  affinities  to  the  notion  ot 
"closed  system"  in  physical  science.  Nowadays  it  is  not  too  difficult  to 
define  a closed  system;  e.g.,  it  can  be  defined  as  a region  of  space  such 
that  (during  the  time  interval  of  interest)  no  energy  flows  either  way  across 
the  boundary.  Once  upon  a time,  when  the  notion  of  energy  was  not  as  clear 
as  today,  it  might  have  been  much  more  difficult  to  say  sharply  what  a closed 
r.ystem  is.  At  the  present  time,  there  is  no  similar  summative  notion  for 
dei isions.  The  term  "inf orm.ition"  is  beginning  to  assume  some  such  role,  and 
perhai)s  we  could  define  a decisional  universe  of  discourse  as  one  tor  which 
no  inforiiuUion  flows  across  the  boundary  during  the  time  period  ot  interest. 
This  is  not  proposed  as  a definition.  Neither  the  term  "boundary"  nor  the 
term  "information"  is  sufficiently  well  defined  to  make  a technical  definition 
appropriate . 

There  is  one  attempt  in  the  literature  to  pin  down  the  problem  under 

discussion  more  than  1 have  Indicated;  this  is  the  treatment  of  grand  and 
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small  worlds  by  L.  .1.  .Savage.  His  notion  ol  small  ttorld  - one  that  is 
dec  1 s i ona 1 I y manageable — is  not  too  far  from  the  notion  of  universe  of  dis- 
course as  I h.ive  loosely  introduced  it  here.  Sav.ige  assumes  there  is  one 
grand  worltl  within  which  any  small  world  can  be  identified  by  aggregating 
states  ot  the  grand  worid.  For  example,  the  small  world  "Rain  tomorrow"  or 
"No  rain  tomorrow”  with  poteiUlal  acts  "Start  for  a drive  in  the  count rv 


tomorrow  morning"  or  "Stay  home,"  can  be  defined  by  lumping  under  Rain  all 
the  possible  conditions  of  the  world  compatible  with  rain,  and  similarly 
lumping  with  no  rain  all  possible  conditions  compatible  with  no  rain.  fhe 
two  acts  are  conceived  as  each  liaving  an  extension  which  defines  what  will 
happen  (outcomes)  given  any  of  the  possible  states  of  the  grand  world  lumped 
under  rain  or  no  rain.  Presumably  two  different  small  worlds  are  compatible 
if  they  have  the  same  probability  functions  and  the  same  utility  functions 
in  the  grand  world. 

Essentially  what  Savage  is  suggesting  is  that  all  decision  problems  have 
the  same  universe  of  discourse  (for  a given  decision  maker)  and  universes  of 
discourse  for  specific  decisions  be  formulated  by  aggregating  in  some  appro- 
priate fashion  the  elements  of  the  grand  universe.  Difficulties  with  this 
program  will  be  discussed  more  fully  In  the  section  on  personal  probabilities. 
In  essence,  the  grand  world  is  simply  too  grand.  Difficulties  arise  which 
are  analogous  to  trying  to  measure  the  diameter  of  the  physical  universe  with 
a yardstick. 

To  sum  up  this  quite  unsatisfactory  discussion  of  universes  of  discourse: 
IJetore  anything  interesting  in  a formal  sense  can  be  done  with  individual  or 
group  estimates,  an  event  space  E and  a response  space  R must  be  specified. 
This  implies  that  a meaningful  question  cannot  be  asked  unless  the  individual 
making  the  response  knows  a fair  amount  about  the  topic  "a  priori."  For 
example,  the  question  "How  high  Is  that  tree?"  cannot  be  understood  without 
knowing  quite  a bit  about  measurement,  about  the  heights  of  everyday  objects, 
and  something  about  tress— c.g.,  that  they  don't  change  their  heights  within 
the  space  of  a few  seconds.  This  prior  knowledge  is  part  of  tlie  universe  of 
discourse  implied  by  the  question.  At  the  moment,  there  does  not  appear  to 
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be  a well-defined  reclinique  for  specifying  this  implicit  knowledge,  or 
determining  the  extent  to  which  it  limits  the  potential  responses  to  the 
quest i on . 

Most  of  the  discussion  in  this  book  will  center  around  three  types  ol 
event  spaces:  (a)  a simple  list,  (b)  a set  of  classes  (types)  of  events, 

and  (c)  a simple  Euclidean  space;  i.e.,  real  number  c.ontinua  of  one  or  more 
dimensions.  However,  it  ituty  be  worth  warning  the  reader  that  these  rather 
tidy  event  spaces  are  drastic  simplifications  of  the  intricati'  conceptual 
contexts  in  which  individual  judgments  arc  normally  formulated. 

3.  ^odels  ^ Estianation 

We  turn  now  to  some  specific  models  of  the  estimation  process.  As  1 
commented  at  the  beginning  of  this  chapter,  at  the  present  time  there  does 
not  appear  to  be  a unified  model  of  the  entire  process  as  outlined  in  Fig.  2 
Rather,  fragments  of  the  process  have  been  modeled.  These  fragments  are  of 
critical  value  in  establishing  some  of  the  important  properties  of  group 
judgment,  but— as  f ragments  — leave  some  major  gaps  when  it  comes  to  formu- 
lating a well-rounded  set  of  guidelines  for  group  judgment. 

Instrospect ion  gives  a rather  bewildering  impression  of  the  estimation 
process,  especially  of  the  generation  step.  At  times  the  numbo'-  "just  comes 
At  other  times,  the  mind  appears  to  engage  in  a miniature  reasoning  process, 
frequently  of  a "narrowing  dtA^m  sort."  The  following  is  an  outline  of  the 
way  my  daughter  (who  is  left  handed)  arrived  at  the  answer  to  the  question 
"Wliat  is  the  proportion  of  people  In  the  U.S.  wl.o  are  left  lianded?" 

"I  know  that  left  handers  are  not  in  the  nuijority  in  the  U..S., 
therefore  It  Is  less  than  50%.  Maybe  10%.  But  If  the  proportion 
were  as  large  as  10%  manufacturers  would  make  a lot  ol  things  for 
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lolt  h. milt'd  pi'i'i'U’ — like  siissors. 


But  t licsf  arc  l arc. 


^>(1  mayl)c 


i l V.  like  half  of  !0%— say  between  lO"'  anti 

At  eortl  Ln^;  to  t ht  l•■.n<  vi  lopi  <1 1 a Htiltanira,  .'.he  .slunild  have  rithuetl  the 
I ij'.ure  hv  another  lattoi  ol  twti.  lint  in  any  event  , her  line  at  reasoning  was 
re  1 at i ve I y cl  oar . 

The  view  tliat  "thinking"  is  a rjuasi-logi  cal  reasoning  process  has  been 
atii'ptetl  by  a luiitibcr  of  researchers  in  the  field  of  artificial  intelligence.^ 
ifne  of  the  basic  tools  in  this  approach  is  the  elicitation  of  "protocols" — 
introspect ively  generated  descriptions  of  how  the  individual  deals  with  a 
problem.  On  the  basis  of  these  protocols,  "heuristics,"  incomplete  algorithms, 
art  generatetl  as  approximations  to  the  thought  processes  of  the  individual, 
ihe  heuristics  are  incomplete  in  the  sense  that  they  do  not  guarantee  the 
solution  to  a problem,  but  usually  they  do  guarantee  that  if  a solution  is 
em  ountered  during  the  process,  tiu?  heuristic  will  recognize  it  as  such. 

I'he  heuristic  aj)proach  has  achieved  some  success  in  dealing  with  woli- 
structnred  problems  like  computer  game  playing  routines  for  checkers  and 
chec.s,  and  theorem  generatois  for  elementary  logic.  The  heuristic  model  fits 
.1  great  deal  that  is  observed  in  introspection,  and  the  artificial  intelli- 
gence program  has  generated  a valuable  stock  of  algorithms  for  attacking 
some  kinds  of  problems.  In  particular,  powerful  routines  for  searching  large 
sp.ices  of  possibilities. 

Nevert hel es', , the  heuristic  model  does  not  appear  to  be  particularly  use- 
ful at  this  stage  of  t lu'  g,.imt‘  for  dealing  with  the  problem  of  group  decisions. 
There  appear  to  be  two  reasons  for  this.  In  the  first  place,  the  kinds  of 
estim.ates  refjuired  for  decisions  have  a remarkably  miscellaneous  quality. 

E.ich  question,  viewe^  as  a reasoning  process,  requires  .i  sep.ir.nte  logical 


p.itttrn  which  Is  surprisingly  ad-hoc.  Ihern  does  not"  appear  to  be  a formal 
way  Lo  deal  witti  the  wide  variety  of  miniature  models  encountered  in  pr.actire. 

I'he  second  reason  is  more  fundamental.  Tlie  heuristic  approach  retjuires 
that  the  estimate  he  posed  ;is  a £^^oblem,  i.e.,  as  a set  of  conditions  for 
wtui  h a well-defined  ^ojut  ion  can  be  described.  Estimation  can  r.irely  b<- 
couched  in  tliis  form.  i’eiliaps  the  most  Irnsf  rating  aspect  of  researcii  with 
estimation  is  tlie  Kick  of  simple  criteri.i  to  determine  when  the  "t  iglit" 

.answer  lias  been  obtained.  It  is  even  very  ditficult  to  formulate  lules 
specifying  when  a given  step  is  in  the  riglit  direction.  To  the  extent  tliat 

such  rules  can  be  formulated,  they  are  likely  to  be  of  ;i  probab  i 1 i t i c n.ature  — 

"go  this  way  and  yon  will  probtibly  get  a better  estimate." 

I'm  nut  sure  to  what  extent  these  comments  represent  limited  imagination 

on  my  part,  and  to  wiiat  extent  they  express  objective  features  of  hum.in  judg- 
ment. One  major  field  of  artificial  intelligence  researcii,  pattern  recogni- 
tion, appears  close  to  some  of  the  approaches  to  estimation  Ibal  will  he 
elaborated  below.  Some  of  the  most  interesting  exiieriments  with  commun i ca L i on 
in  group  proiesses  liave  used  tasks  that  can  be  expressed  sliaiply  a;,  problems 
for  the  group  to  solve,  with  a clearly  defined  criferion  of  solul ion.  Among 
these  are  the  experiments  by  llavelas  with  communication  networks,  where  tlie 
task  tan  easily  be  solved  by  a single  Individual  having  all  the  frai;mentary 
information  initially  .spread  among  the  group. 

One  elementary  way  to  represent  the  information  available  to  an  individual 
Is  a subset  in  an  event  space.  Suppose  the  (luec.tion  is  (for  a doctor)  "Will 
this  p.itieiit  die?"  The  doctor  has  a certain  amount  of  inlormation  about  the 
patient  — symptoms,  lile  stage,  lile  style,  lUe.  Tiiis  ran  lie  i epi  isiiil  ed  in 
tlie  space  ol  p.itients  at;  one  tvi>e,  or  class,  or  siU  of  patieiii>.,  1.,  i lie 
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siibscri|)l  i I nd  ic;it  ing  that  this  intormatlon  is  avallalilc-  to  doctor  i.  Tlie 
(|nostlon  (>1  iiiloiosi  Is  to  wliat  I'xtoiil  this  sot  O'.t'rl.ips  t i\c  sot  I)  of  timso 
patiiMits  who  dll'  (within  a glvi-n  siiort  time  alter  oxaininal  ion)  , as  i 1 1 u>.t  rat»*d 
in  Fip.nro  h i. 

To  estonil  Figure  iia  to  tlie  group  case,  we  need  only  .issume  tliat  tiiere  are 

several  doctors,  eacii  of  wiiom  classifies  the  patient  in  a set  I.  and  Lake  the 

1 

intersection  ot  these  sets  as  the  group  judgment.  G = 77  I . , where  77  is  the 

j ^ 

logical  product,  as  illustr.tted  in  Figure  6h . 

In  this  simple  case,  the  group  estimation  process  consists  in  determining 
the  intersection  of  the  knowledge  sets  of  the  doctors  and  relating  this  common 
set  to  tlie  set  of  those  who  will  die.  The  common  set  will  be  smaller  than  any 
individual  set,  and  thus  is  more  likely  to  be  either  totally  within  or  totally 
outside  the  set  of  interest,  l>.  The  example  is  extremely  elementary,  but  it 
illustrates  a possible  approach  to  group  judgment  which  invokes  only  notions 
from  formal  logic.  Although  there  seem  to  be  possibilities  inherent  in  such 
approaches,  I have  found  them  somewhat  unproductive.  As  I remarked  earlier, 
that  may  be  a limitation  of  my  own  thinking. 

I have  found  a rough  analogy  useful  in  trying  to  think  about  estimation. 
Wo  can  conceive  of  tiie  estimation  process  in  two  ways.  One  is  the  traditional 
way  of  thinking  of  it  as  a highly  structured  reasoning  process  analogous  to 
the  precisely  orchestrated  steps  of  deductive  inference.  We  could  call  this 
ttu‘  algorithmic  view  of  estimation.  The  other  point  of  view  might  be  called 
the  chemical  orientation.  Information,  rather  than  being  put  together  in  an 
intricate  and  stylized  fashion,  is  mixed  or  blended  like  Ingredients  in  a 
brew.  The  analogy  is  similar  to  the  contrast  between  mechanics  and  thermo- 
dynamics. In  mechanics,  the  detailed  configuration  of  the  elements  and  forces 
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is  used  to  pre<liet  tlie  behavior  of  the  system;  in  thermodynamics,  on  the  other 
hand,  gross  averages  of  tlie  properties  of  the  pat'  ides  are  used  to  [iredict 
gross  averagi-s  at  a later  time.  Knowing  tiie  pressure  and  tenpi'rat  me  and  heal 
inflow  of  a system,  it  is  not  necessary  to  know  the  locations  of  individual 
tiH>lecules  to  preduct  the  pressure  and  temperature  at  a later  time. 

The  analogy  is  perhaps  only  suggestive;  but  it  allows  using  notions  such 
as  mixing,  diftusion,  adding  or  subtracting  amounts  of  information  and  the 
like  with  a freedom  of  conscience  that  is  hard  to  attain  for  one  steeped  in 
the  rigors  of  formal  logic. 

4.  l^acj-or  Models 

One  fruitful  approach  to  a theory  of  estimation  is  the  family  of  factor 
models.  On  this  approach,  the  output  of  the  memory  search  activity  is  a set 
of  relevant  factors  (cues,  components,  items  of  information,  etc.)  The 
estimate  R Is  assumed  to  be  a function  of  this  set  of  factors, 

K = !•  (f  j , . . . , f 1^,  . . . , f^)  (1) 

Althougli  in  theory  K could  be  about  anything,  only  a small  range  of  the  possi- 
bilities has  been  explored.  At  one  extreme  are  the  single  variable  psycho- 
physical laws,  R = F(x),  where  x is  a physical  magnitude  (the  stimulus)  such 
as  weight,  sound  intensity,  and  the  like,  and  F is  a power  law  (S.  S.  Stevens) 
or  an  exponential  law  (the  classic  Weber-Fechner  law).^  A relatively  sophls- 
ticated  formulation  Is  the  algebraic  model  approach  of  Andersem. 

Ttie  most  widely  exploited  form  of  F is  one  of  the  simplest,  namely  the 
linear  form.  On  this  approach,  the  output  of  the  evaluation  step  is  a set 
of  weights  which  perform  the  triple  function  of  expressing  the  relevance 

of  each  factor,  of  discounting  the  factor  for  solidity,  and  scaling  the  numeri- 
cal value  of  the  factor  to  match  the  size  of  tlie  required  estimate.  The  esti- 
mate Is  then  given  by 
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« = E '•'Jk  + 


i.e.,  the  estimate  is  a weighted  sum  of  the  factors,  where  c is  an  additive 
constant . 


The  linear  model  has  been  utilized  to  describe  multi-cue  perception, 
complex  value  functions^*^  and  more  general  kinds  of  est i mat  ion . ^ ^ 

The  notion  of  "solidity"  has  not  received  a large  amount  of  attention 
in  the  psychological  literature.  Probably,  the  reason  is  that  most  experi- 
mental Investigations  have  dealt  with  the  case  where  the  factors  are  "given" 
either  as  environmental  cues  in  the  case  of  perceptual  tasks,  or  as  experi- 
menter-furnished values  in  other  tasks.  In  these  experiments,  the  factors 
are  all  "completely  solid;"  the  only  problem  for  the  subject  is  to  assess 
how  significant  they  are  for  the  given  estimate  (relevance).  In  the  more 
general  case  we  are  examining,  the  factors  are  self  furnished  and  the  addi- 
tional consideration  of  how  well-established  the  factors  are  is  a significant 
part  of  the  task. 

Unfortunately,  on  the  linear  model,  there  is  no  direct  way  to  separate 
these  three  functions.  It  would  be  feasible  in  theory  to  assume  that 
Wj^  = g(Sj^,  rj^,iiij^)  where  Sj^  is  the  individual's  assessment  ol  the  solidity  of 
the  given  Information,  r^^  is  his  assessment  of  its  relevance,  and  m^^  is  some 
assessment  of  tiie  relative  size  of  the  factor  and  the  desired  estimate.  As 
an  example  for  nij^,  consider  the  question  "How  many  telephones  are  tliere  in 
Africa?"  One  relevant  factor  is  income.  The  estimator  might  reason,  "The 
average  Income  in  Africa  is  very  low— no  more  than  a few  '.lundred  dollars  per 
year— thus  the  number  of  telephones  is  probably  small."  The  average  income 


i.s  not  tl>e  same  "size"  as  the  niimher  of  telephones.  Thus  a scaling  factor 
is  iieodetl  to  brinp,  tlic  two  in  line. 

Iflicther  at  this  stage  of  the  game  it  is  wortli  introciiicing  all  of  this 
complication  into  the  model  is  difficult  to  determine.  Individuals  can  dif- 
lerentiate  between  relevance  and  solidity,  and  the  size  feature  is  an  obvious 
consideration.  However,  for  purposes  of  applying  the  factor  model  to  group 
decisions,  it  is  difficult  to  see  how  much  more  intricacy  than  formula  (2) 
can  be  used. 

Some  obvious  variants  of  (2)  appear  worthy  of  notice.  As  we  shall  see, 
there  is  reason  to  believe  that  for  many  kinds  of  estimates  individuals  scale 
their  responses  on  the  logarithm  of  the  quantity  being  estimated.  In  those 
cases  the  formula 

f = + c (3) 

is  more  appropriate,  where  r is  the  logarithm  of  the  individual's  response, 
r * 

R = e . 

To  compensate  for  different  sized  factors,  and  also  to  compensate  for 
possible  large  differences  in  range  or  variability  of  the  factors,  a commonly 
used  transformation  is  the  z score 

f - f 

"f  - Sj 

where  f is  the  mean  of  the  values  of  a given  factor  f,  and  Sj.  is  the  observed 
standard  deviation  of  the  factor.  The  estimate  then  becomes 


R = w 


- \^f 
k k 


^ rr  k rr 

(3)  is  equivalent  to  the  stattiment  R = c /7  f , where  //denotes  tlie  product. 

k R 

Note  that  in  this  form,  the  weights  appear  as  exponents. 
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For  I'sr  imat.  ion  problems  where  the  are  reasonably  well  def  Lneil  and 
objei'live,  F can  be  identilied  using  multiple  correlation  (linear  f’StimaLion) 
techniques.  Given  a sutiiciently  large  set  of  estimates  by  an  individual  of 
,i  spec'ifir-  type  of  quantity,  an  optimal  (for  the  given  data)  linear  model  ol 
the  individual's  estimates  can  be  computed.  For  example,  a number  of  experi- 
menters have  investigated  the  ability  of  college  students  and  faculty  members 
to  predict  the  first  year  grade-point  average  of  entering  students,  based  on 
three  factors;  a score  on  a college  entrance  examination,  the  high-school 
grade-point  average,  and  a rating  of  the  excellence  of  the  high  school.  The 
multiple  regression  of  the  individual's  estimate  against  these  three  factors 
furnishes  a set  of  weights  which,  with  a little  care,  can  be-  interpreted  as 
the  relative  importance  that  the  individual  attaches  to  each  factor. 

Ttie  ccjmmonly  employed  figure  of  merit  (score)  lor  factor  models  is  cor- 
relation with  the  true  answer.  Given  a computed  model,  there  are  two  poten- 
tial scores,  the  correlation  of  the  "raw"  estimates  of  the  individual  with 
the  true,  and  the  correlation  of  the  estimates  computed  from  the  individual's 
model  with  the  true.  On  the  estimation  of  grade-point  averages,  both  stu- 
dents and  faculty  make  a relatively  poor  showing.  Average  correlations  range 
around  .3,  even  for  f.aculty  with  experience  in  college  admissions. 

A rather  surprising  result  of  these  investigations  has  been  that  the 

individual's  model  uniformly  outperforms  the  individual.  This  result  has 

1 2 

been  labelled  hoolsLj  a^^>  i iij^  by  investigators.  A common  int  er[)retat  ion  ol 
this  result  is  that  Individuals  are  more  varl.able  than  their  model. 

In  .iddition  to  the-  model  of  the  individual's  estimate,  there  is  a cortc- 
.spo.tding  model  ol  the  trial  lonship  between  the  true'  .inswer  and  the  lactoi.s; 
t hus  wi>  can  write 
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1 = G(f, r^) 

1 ra 

Siiui’  ^.5)  is  Ltu’  mirror  of  (IJ,  tin-  comln  n.i  t fc>a  of  the  two  was  dubhed 

I lu-  "lins  modal"  by  Ur  iinsw  ic  k.  If  ('>)  is  .ilsn  lical'il  .is  ,i  liiiaai  iclalio;'. 
sliij),  then  il  is  cJaar  that  t lu-  correlation  ot  K and  I lurnisiud  by  (2)  can 
bo  no  better  than  tlie  multiple  correlation  of  T and 

A lively  debate  has  been  going  on  for  a decade  or  more  concerting  the 
si  )',n  i 1 i cance  of  research  into  lactor  models  for  professional  decisions.  It 
is  .1  truism  that  the  objective  model  (5)  will  outperform  intuitive  judgments, 
providing  the  model  is  correct.  If  individual  judgments  can  be  .approximated 
very  well  by  linear  models,  then  an  optimal  linear  modei  of  the  sort  (2) 
will  outperform  the  individuals.  In  many  types  of  clinical  Judgments,  linear 
models  have  proved  to  be  good  approximations,  and,  in  fact,  as  the  boot- 
strapping phenomenon  indicates,  bolter  than  the  individual.  This  raises  the 
question  whether  certain  kinds  of  professional  judgment  can  be  replaced  by 
models  — by  optimal  objective  models  where  sufficient  d.ata  exists  to  compute 
the  moili>ls,  otherwise  by  models  of  the  individual's  judgment.  A rather  r.idi- 
c.il  suggestion  .ilong  these  lines  has  been  proposed  by  Robin  Dawes  tb.at  will 
be  diseussjul  in  (.'hapter  IV  on  nominal  judgment. 

These  proposals  appear  to  have  made  little  headway  In  professional 
circles.  The  reason  may  be  that  professional  people  resist  giving  up  certain 
roles,  or  just  cultural  lag,  or  possibly  the  fact  that  professional  judgment 
usually  cannot  be  reduced  to  a tew  well-specified  types  of  estimates.  The 
doctor  must  not  only  decide  how  sick  a given  patient  is  on  the  basis  of  a 
pre-specl f ied  set  of  symptoms,  but  also  wbicti  set  of  symptoms  to  examine, 
what  course  of  treatment  to  undertake,  when  to  terminate  treatment,  and  the 
like.  There  are  active  investigations  underway  studying  whether  the.se  broader 


5T 


contexts  can  also  be  reciuceii  to  well-defined  models,  either  based  on  objee- 
Live  d.iia,  or  on  professional  judgment,  or  both. 

The  set  of  i‘-sues  raised  by  efforts  to  model  professional  judgment  are 
all  relevant  to  the  "one-iiead  rule."  In  the  most  general  case,  where  the 
judgments  of  interest  cannot  be  embedded  in  a well-defined  family  ot  judg- 
ments, and  where  the  i nform/it ion  is  miscellaneous  and  non-objective,  attempts 
to  model  the  jirocess  wmild  probably  not  be  productive;  each  estimation  task 
would  require  its  own  special  model.  However,  tliis  is  not  definitive.  It 
is  possible  that  even  in  tiiese  extreme  cases,  attempts  to  identify  at  least 
the  major  factors  which  influence  the  judgments,  and  to  formulate  a rough, 
linear  model,  may  produce  results  which  are  more  accurate,  and  more  reliable 
(in  the  sense  of  including  less  random  variation)  tiian  less  systematic  methods 
of  arriving  at  estimates. 

5.  Probabilistic  Mod  els 

One  feature  that  appears  to  be  lacking  in  the  factor  theory  of  estima- 
tion is  an  explicit  statement  of  the  degree  of  certainty  of  the  jiulgment. 

An  individual  can  take  account  of  his  own  uncertainty  (in  the  ingredients) 
via  the  weights  he  attaches  to  factors,  and  formulate  his  estimate  accordinj;ly 
but  the  overall  degree  of  certainty  is  not  transferred  to  the  final  response. 
Since  it  is  clear  that  the  judgments  of  greatest  interest  are  those  which 
are  plagued  to  some  extent  by  uncertainty,  many  decision  analysts  have  empha- 
sized probabilistic  judgments  which  contain  an  overt  expression  of  the  esti- 
mator's certainty. 

Probabilistic  estimation  is  a different  spei-ies  1 rom  magnitudi'  estima- 
tion. Most  ot  the  theory  and  experimentation  associated  with  p roh.ih i I i I y 
estinuation  arise  1 rom  a dilferent  context,  tiamely,  theories  ot  lat  tonal 


rhi'ice.  rill’  l.Uti'f  havi-  jjrowii  out  ot  the  1 heorv  of  rational  cronnmic 
bi’h.ivlor.  Bv  ami  largo  thore  has  been  less  cono<  rn  with  the  generatinn  of 
osl  1 mat  I , anil  more  oomorn  with  thi’  explicit  rep  risen  t at  i on  of  uncertainty, 
the  ('onsistency  of  separate  hut  related  estimates,  and  revision  (updating) 
cstini.ites  based  on  addition.il  in  1 orm.it  inn  . 

fhese  emphases  have  resulted  from  the  close  association  of  the  theory 
ot  a decision  with  the  calculus  of  probabilities.  The  calculus  is  not  a 
theory  of  specific  probabilities,  but  a statement  of  the  relationships 
between  probability  assertions.  Like  formal  logic,  the  calculus  of  proba- 
bilities is  empty.  it  does  not  deal  with  the  correctness  of  specific  proba- 
bility .issertions,  but  rather  is  concerned  with  questions  such  as:  given 

the  probabilities  of  some  events,  how  do  you  compute  the  prob.abilities  of 
other,  rel.ited  events? 

The  approach  to  probabilistic  estimation  most  closely  related  to  deci- 
sion analysis  is  the  theory  of  personal isi ic  or  subjective  probability  asso- 
ciated with  the  names  of  Ramsy , de  Finetti,  and  Savage.'^  There  h.as  been 
a welter  of  discussion  about  the  signification  of  the  probability  estimates 
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defined  by  this  theory— do  they  denote  degrees  of  belief,  propensities  to 
wager,  shadow  prices  (trade-off  weights)  on  events,  and  the  like. 

The  question  of  signification  has  been  complicated  by  an  additional 
issue,  namely  the  so-called  problem  of  the  probability  of  a single  event. 

For  most  objective  theories  of  jirobabi  1 ity , events  which  c.an  be  assigned  a 
prob.ibility  are  repeatable.  Thus  a coin  can  be  flip(ied  (theoretically)  an 
indefinite  number  of  limes.  Subjective  theories  hav('  been  applied  to  events 
which  — at  first  glance  — are  not  repeatable.  In  fact,  the  theories  were 
develojied  in  part  to  meet  an  apparent  requirement  for  dealing  with  uncertain 


but  non-repeatab Le  events.  For  example,  if  we  ask,  "Will  there  be  a major 
nuclear  war  between  the  United  States  and  the  Soviet  Union  during  the  next 
twenty  years?"  there  i.s  no  question  but  what  the  reply  is  uncertain.  Yet  a 
major  nuclear  war  within  the  next  twenty  years  is  not  a repeatable  event  in 
the  same  sense  in  which  a toss  of  a coin  showing  heads  is  repeatable. 

The  point  of  view  I take  on  this  topic  is  that  so-called  non-repeat  ah  1 e 

events  are  theoretically  repeatable.  Thus,  one  can  invigine  a su[)er  entity 

(cosmic  scientist)  conducting  an  experiment  in  which  a set  of  earths  is  con- 
figured to  resemble  the  earth  at  present,  including  its  human  population  and 

political  structure,  and  the  entire  set  is  allowed  to  run  on  for  twenty  years, 
while  the  super  being  carefully  tabulates  the  number  of  earths  on  which 
nuclear  wars  occur.  This  grisly  gedanke  experiment  doesn't  api>ear  to  violate 
any  logiial  laws,  and  possibly  no  physical  laws.  .Some  events  are  technologi- 
cally repe.itable  (for  present  day  hunvins)  like  the  flip  of  a coin;  foi  otliers, 
there  are  continuing  systems  in  which  Che  events  in  fact  repeat,  like  various 
kinds  of  telephone  calls;  still  others  are  repetitious,  like  tides  or  seasons; 
some  are  repeatable,  but  very  rare,  like  Richter  magnitude  10  earthquakes  — 
none  has  occurred  in  recent  history.  The  interaction  of  history  and  possi- 
bility provides  a rich  and  fuzzy  conglomeration  of  event  types.  To  try  to 

14 

organize  all  of  these  into  one  neat  concept  like  the  collective  ot  von  Mises 
is  straining  too  hard. 

X 

of  p.ir  t i I'ular  interest  are  the  nonobserved  events  which  havi-  very  low 
probabilities,  but  are  not  necessarily  impossible,  like  the  Ricii'r  magniuule 
10  eartfiquake,  or  t lie  delig.litful  example  of  Frederick  Hosteller  in  ,i  refer- 
ence that  now  escapes  me  ot  the  likelihoml  that  a human  being  (under  pri'senl 
circumstances)  will  live  to  be  1000  years  old.  If  the  present  distribution 
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ot  ages  is  taktri  sfriousJy,  then  the  ptobability  of  a miilinarian  is  not 
zero  — though  very,  very  small.  In  order  to  deal  with  these  type.s  of  eve.nts 
.IS  "repeatable"  it  is  necessaiy  to  envis.ig<'  h vpol  he  I i c.j  1 setpienei's  of  events, 
in  short  to  perform  gedanke  experiments.  The  gedanke  experiment  has  a long 
and  truitful  history  in  the  physical  sciences.  Reluctance  to  ust-  the  device 
in  the  social  sciences  stems,  I suppose,  from  the  fragmentary  condition  oi 
theory,  and  the  fear  that  the  imagination  can  run  wild  with  no  firm  theoreti- 
c.il  constraints.  But  the  theory  of  probability  is  relatively  well  adv;inced, 
and  gedanke  experiments  can  be  formulated  in  a fairly  well-disciplined  manner. 

The  contention  that  so-called  non-repeatable  events  are  repeatable  in 
theory  does  not  have  the  implication  that  probability  is  to  be  defined  as  a 
relative  frequency.  Relative  frequencies  are,  on  this  point  of  view,  one 
way  to  measure  probabilities,  in  much  the  same  sense  that  the  position  of  a 
column  of  mercury  in  a thermometer  is  one  way  to  measure  a temperature.  The 
temperature  is  not  the  height  of  the  column  of  mercury.  Of  course,  Che 
statement  that  the  probability  of  an  event  is  p has  the  consequence  (derived 
from  the  calculus  of  probability)  that  l_f  the  antecedetit  of  the  event  (e.g., 
the  flip  of  the  coin  for  the  event  heads)  is  repeated,  and  the  repetitions 
are  independent,  then  the  event  will  occur  with  the  relative  frequency  p in 
t'he  long  run. 

A probability  estimate,  on  this  point  of  view,  is  an  individual's  judg- 
ment of  an  objective  property  of  a system.  The  estimate  is  no  more  subjec- 
tive than  an  estimate  by  someone  of  the  height  of  a visible,  but  unmeasured 
tree,  or  an  estimate  of  the  width  of  a river  encountered  by  an  explorer 
without  a transit  and  chain. 

The  basic  datum  for  decision  analysis  is  that  individuals  can  and  do 
estimate  the  probabilities  of  relevant  events  in  numerical  terms.  One 


approacli  to  a theory  would  be  to  start  with  that  fact  and  ask  "How  good  are 
such  estimates,  and  how  useful  are  they  for  making  decisions?"  /\n  obvious 
problem,  iintil  recently,  with  this  approach  has  been  the  difficulty  of 
specifying  what  is  meant  by  a good  probability  estimate  of  a non-repeating 
event.  This  issue  will  be  examined  more  fully  in  Che  next  Chapter  on  scor- 
i ng  rae  t hod  s . 

The  subjectivist  theories  start  a little  farther  back,  namely,  with  tlie 
fact  tliat  individuals  make  choices  and  these  choices  are  influenced  by  their 
perception  of  the  likelihood  of  events  relevant  to  the  choices.  The  Iheorv 
lays  down  certain  criteria  for  the  choices  to  be  "good"  and  investigates  the 
consequences  of  these  criteria  for  the  properties  of  probability  estimates. 

Although  the  subjectivist  point  of  view  is  somewhat  at  variance  witli 
tlie  general  perspective  of  this  book,  it  is  useful  to  have  an  exposition  of 
the  theory.  It  is  a well  thought  out  formulation  ot  some  ot  the  criteria 
for  good  decisions,  and  it  furnishes  some  useful  conceptual  apparatus  for 
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later  investigations. 

The  theory  begins  with  the  notion  ot  choice,  or  alternatively  with  ilie 
notion  of  preference.  The  two  are  tied  together  by  the  assumption  that  if 
the  individual  is  presented  with  a choice  out  of  a set  of  al teiu.it ives , he 
will  select  the  one  lie  most  prefers.  The  nature  of  the  al  ornatives  used 
as  a starting  point  by  various  theorists  liave  differed  somewhat — Savage 
prefers  starting  with  preferences  among  acts , for  Uamsey  it  is  goods , for 
others  it  is  the  outcomes  of  acts,  and  for  some  it  is  reward  value,  or 
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Tlie  exposition  which  follows  is  basically  that  ot  1..  .1.  S.ivage  tor  the 
ordinal  theory  of  subjective  jirobabil  ity . 


lit  i 1 ity  ot  stall's  ol  the  worl'J.  Must  ol  these  starting  leaii  to  .itiCHJt 


tile  same  eoue  1 ii  . ioie; . 

I’-M'  Llie  present  exposition,  1 will  use  a ratluT  bland  conceiit  v.'hieh 
will  hi'  des  iipiat  I'd  by  the  term  si  tuat  ion.  A situation  is  a state  of  tiu' 
worlil  vii'ued  by  an  individn.il  1 rum  the  standpoint  o!  his  intere;:ts.  To  give 
.in  i 1 lust  r.it  ion , a phvsicist  might  describe  a chair  as  a complex  .assemblage 
ol  atoms.  For  t be  .iverage  man,  most  of  that  description  would  bi‘  i rrelev.ant 
lor  evi'ryday  decisii.ns  such  as  whether  to  sit  on  the  chair,  or  buv  ,i  new  one 
and  the  like.  This  distinction  is  sometimes  expressed  by  distinguishing 
between  state  variables  and  criteria  variables,  v;here  the  latter  are  the 
dt'.scr  iptors  that  are  relevant  to  preferences.  This  formalization  is  per- 
tinent, but  more  intricate  than  is  needed  for  the  exposition  of  tiie  tlu'ory 
ol  snbji'ctive  [irobabi  lity . * 1 use  the  term  situation  mainly  to  emphasize  that 

it  is  the  interests  of  the  individual  that  define  Che  relevant  objects. 

IJi'  envisage  .i  set  X = {x,y,z,....}  of  s i tu.it  ions . In  general  tliese 
will  be  potential  situations — they  are  poss  i b i 1 1 1 1 1's  that  may  or  may  not 
be  realized.  The  basic  assumption  concerning  a is  tb.it  ,i  given  individual 
has  It'elings  .about  the  relative  desirability  of  dilfi'rent  situations. 

Tills  c.aii  be  expressed  by  saying  that  there  is  a preference  relation  on  X. 

For  technical  reasons  it  is  convenient  to  start  with  the  notion  of  "ptefers 
or  is  indiiferent"  rather  than  strict  preference.  Thus  x > y means  the 
individu.il  either  prefers  x to  y or  Is  indiiferent  between  tlii'm.  (The 

•k 

For  many  situations  of  practical  concern,  the  leatnrt's  which  determine  desir- 
ability are  not  known.  In  the  most  elementary  cases,  it  is  ni'cessary  to 
have  a treatment  which  does  not  explicitly  depend  on  eriteri.i. 
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analogy  with  "groaLor  than  or  equal  to"  is  close,  and  is  one  reason  for 
borrowing  the  notation  > .) 

The  two  basic  conditions  determining  > are; 

ria.  Connexitv.  Tor  every  pair  x,y  in  X,  eitlier  x > y or  y > x. 

Plb.  Transitivity.  For  every  triple  x,y,z  in  X,  if  x > y,  and 
y > z,  then  x > z. 

I’la  asserts  that  the  preference  relation  is  complete;  for  every  pair 
of  situations,  the  individual  knows  whether  tte  prefers  one  to  the  otlier, 
or  is  indifferent.  Plb  is  usually  considered  the  property  whicii  makes 
preference  relations  rational.  Thus,  for  example,  witli  Pla,  it  asserts 
that  preferences  will  not  go  around  in  a circle;  it  rules  out  x preferred 
to  y,  y prei erred  to  z and  z preferred  to  x. 

Corresponding  strict  motions  can  be  defined. 

Dia.  Strict  Preference.  x > y means  x > y and  not  y > x. 

IJlb.  I'.quivalence.  x ~ y means  x > y and  y > x. 

it  i.s  easy  to  prove  that  for  any  pair  x,y,  one  ot  three  things  lioid, 
eitherx-yory-xorx  y. 

Included  among  situations  are  a type  that  will  be  called  cont ingeucies . 
A contingency  is  a complex  situation  where  the  outcome  depends  upon  tlie 
occurrence  of  some  event.  For  example,  the  condition  of  Los  Angeles  in  the 
year  1990  will  depend  upon  the  occurrence  of  a major  earthquake  between  now 
and  tiien.  The  state  of  the  pocketbook  of  a citizen  rolling  dice  in  a casino 
in  Las  Vegas  will  depend  upon  the  appearance  of  a seven  in  his  first  roll, 
I'tc.  I h i s type  ot  ilependeiu  e will  he  expressed  hv  the  notai  ion  (x|K)  ri'ad 
"the  '.iliiat  ion  x will  obtain  it  the  ev<*nt  F occurs."  Tlieie  is  notliing 
probabilistic  al>out  tlie  dependence  expressed  by  this  notation.  It  could 
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lu'  physical  "Civcn  a major  i-artlupiako , a majoiitv  ul  the  hricl>  hiiildin^;:; 

hoforo  Ihid  will  he  lieavily  damap'd."  It  could  tu'  : iie  result  of  a 
social  contract:  "(liven  that  the  hall  falls  into  a s.lot  with  th<  saue  num- 

ber a;:  the  one  on  whicli  you  put  a $1,000  cliip,  1 will  give,  you  chips  worth 
$3u,000."  Or  it  can  be  logical,  "(liven  that  Cleopatra  was  born  a peasant 
and  lier  nose  was  5 1/2  inches  long,  her  nose  was  3 i/2  inches  long." 

The  expression  (x|K)  is  incomplete  in  that  it  doesn't  say  what  will 
oht.iin  il  K doi's  not  occur.  The  notation  (x,y|E)  will  be  used  to  express 
tne  more  complete  contingency  (x|F.)  and  (y|F.)  where  K means  "E  decs  not 
occur."  Eor  example,  with  the  social  contract  on  the  roulette  wheel,  x 
is  "you  get  cliips  worth  $36,000,"  y is  "l  take  your  chip  and  you  get 
nothing."  E is  "the  ball  falls  in  the  slot  with  the  same  number  as  tiie  one 
on  which  you  put  a $1,000  chip,"  E is  "the  ball  falls  in  any  other  slot." 
(x,y|E)  will  also  be  called  a contingency. 

Mon-  ga'nerallv,  if  <E,}  is  any  partition  of  the  universe  of  discourse 
(complete  .iiid  t■xhau^.t  ive  division)  and  {x^l  a set  ol  situations  such  tiiat 
e.u-h  Xj  is  contingent  on  the  corresponding  E.,  the  expression  (x^^  E^^, 
x.,  I.  ,,...,  x^^  I E^)  , abbreviated  (Xj|E^),  represents  an  n-fold  contingency. 

Note  tb.it  (x,y|E)  is  just  (x|e,  y|E).  Complex  contingencies  can  be  formu- 
lated where  the  situations  are  themselves  contingencies.  The  prize  in  a 
lottery  can  be  .'mother  lottery  ticket.  Unfortunately  for  the  neatness  of 
the  theory,  the  distinction  between  situations  which  are  not  contingencies 
.and  those  which  are  needs  to  be  maintained  for  the  e.irly  stages.  Situations 

k 

There  has  been  a massive  (and  not  completely  benign)  neglect  of  this  rela- 
tionship In  the  literature  on  the  Inundations  of  probability.  It  is  not 
the  same  as  implication.  Some  of  the  problems  will  be  discussed  below. 
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v;liich  are  not  contin^'enc  ies  will  be  called  elementary.  The  idea  expressed 
by  a conlin^tency  has  analogies  in  all  tlie  subjective  probability  theories, 
rhus,  Ramsey  uses  the  term  wa^e r , Savage  uses  act  and  consequence , von  Neumann 
and  Morpi'i'i^tern , probability  combinations.  Otlicr  cognate  terms  are  prolia- 
b ility  mixtures . letter  ies , de  Finetti's  random  quantities . I am  sorry  to 
add  to  all  of  this.  There  are  some  technical  dilferences.  For  example, 

(x.yli.)  is  not  identical  to  (x,y|F)  providing  E i F,  even  if  the  probability 
of  E is  equal  to  the  probability  of  F. 

To  complete  the  building  blocks,  we  need  a set  of  events,  U = {E , F,(; , . . . } . 
L is  the  universe  of  discourse  of  events,  and  contains  all  the  events  worth 
consideriiif,  for  a given  problem.  The  symbol  U will  also  be  used  to  designate 
tile  universal  set,  i.e.,  the  set  tliat  includes  all  events.  Tlierc  should  be 
no  problem  with  ambiguity  here,  since  most  of  the  references  to  U in  the 
following  will  bi-  in  the  seconti  sense.  U is  the  same  as  tlie  estimation 
(event)  space  discussed  earlier.  All  ol  tlie  problems  associated  with  spec  it y- 
ing  estimation  siiaces  apply  to  U. 

Technically,  U will  be  assumed  to  be  an  algebra  of  sets.  This  means  U 
contains  all  the  sums  and  differences  of  members  of  U,  and  it  contains  the 
null  (empty)  set  0.  For  every  set  E,  U also  contains  E (the  complement  of 
E or  not-E).  In  addition,  we  need  the  notion  of  joint  occurrence  of  events, 
f!.F  (both  E and  F)  and  tlie  disjunction  E v F (either  E or  F or  both).  Since 
I will  not  be  concerned  with  tlie  fine  structure  of  U,  its  properties  will 
not  be  spelled  out  in  axiomatic  form.  Tlie  interested  reader  can  get  details 
from  any  text  on  the  theory  ol  i;ets  or  any  book  on  measure  theory. 

The  distinction  iu'tween  II  and  X is  not  as  shar|i  as  one  miglit  wish. 
Normaliy,  the  distinction  is  i.iade  in  terms  ol  control  ; the  i-vents  in  U are 
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iliosi'  wliich  ;irc  not  mulor  tho  conLroJ  ot  a f;ivcn  i lui  i vi  tiii.i  L or  j>rou[),  Lliose 
in  X I'.m  at  loasl  bo  int  1 nencofi , lii'Hirc  thoy  ai  i'  ot  Lon  oaiLod  connoniu-n'  os 
ot  oiitoomos.  llowi'vor,  this  il  ist  ino  t ion  breaks  clov/n  in  manv  ivpcs  ol  an.iiysifi. 
Tin  batio  (i  i tit  i nc  L i on  on  the  present  approacti  is  one  of  eva  Luat  ion . Ttie 
itims  in  X are  those  for  wiiich  tlie  individual  lias  a clear  preference — they 
"nuike  a dilference,"  The  items  in  U make  a difference  throuj^h  their  eflects 
on  items  in  X. 

To  summarize,  the  elements  oi  the  tlieory  are:  tlic  set  oi  situations  X, 

the  preference  relation  > , the  set  of  events  U,  and  the  operation  (Xj^jE.) 
whicli  (generates  contingencies.  i’la  and  I’lb  specify  the  properties  of  >. 

I’2  extends  Pi  to  contingencies. 

P2 . Closure  tor  contingencies.  Given  any  set  ol  situations  I } and 
a corresponding  ( equ i-niinierous)  set  of  events  {Ej^j,  which  is  a partition  oi 
I!,  the  contingency  (x^.E^^)  is  in  X. 

P2  asserts  that  the  Individual  has  preferences  for  contingencies  as 
well  as  for  non-cont ingent  situations,  and  in  liglit  of  Pi,  any  contingency 
can  he  compared  with  any  situation.  The  significance  of  tlie  wholesale  inde- 
ptuulence  of  elementary  (non-contingent)  situations  and  events  will  be  dis- 
cussetl  below. 


(a)  (x,y|E)  ~ (y,x|E) 

(b)  (x,y|0)  ~ (y.xjU)  ~ y 

(c)  (x.xjE)  ~ X 

(d)  ((x,y I E) ,y 1 E)  ~ (x,y|E.F) 

(e)  (x,(x,y|E) |f)  ~ (x,y|E  v F) 
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ri  fxprt'sscs  ;i  number  of  properties  wliich  are  immediate  consequences 
of  tlie  "meaning"  of  ' ,y|K).  It  is  redund;int,  in  tbe  sense  that  some  of  tlie 

properties  can  be  derived  from  tbe  others.  However,  to  do  tiiis  wl  til  entire 
rigor,  it  would  he  necessary  to  axiomatis^e  tiie  pertinent  i>arts  of  set  ttieory 
which  I promised  not  to  do  above.  (a)  simply  emphasizes  that  in  (x,y|b), 
the  situation  x is  contingent  on  the  occurrence  ol  , and  y is  contingent 
on  tile  occurrence  of  K.  (ti)  states  tlie  obvious,  that  any  situation  contin- 
gent on  the  null  set  is  never  realized,  and  conversely,  any  situation  con- 
tingent on  the  universal  set  is  always  realized.  (c)  is  equally  oiivinus. 

A situation  contingent  on  either  an  event  or  its  negation  is  always  realized 
(d)  and  (e)  express  the  appropriate  representation  of  complex  contingencies, 
as  can  be  seen  from  the  diagrams. 


(d) 


(e; 


Figure  7.  Compound  Contingencies 


1Mb  suggests  a more  general  notion,  namely  that  of  a null  event . 
Roughly,  the  idea  is  an  event  whose  probability  is  zero,  even  though  it 
may  not  be  empty. 

1)2.  b is  null  means  (x,y|E)  ~ y for  every  x and  y, 

1’4.  Dominance . If  lx.}  and  ty^  } are  sets  of  elementary  situations, 
and  X j > y . lot  e vei  y I , then  (,  x ^ ] I. , ) > ( | K j ) . 11,  in  aid  i I 1 on  , x . ■ v , 
lot  s;ome  j,  and  K ^ Is  not  mill,  tlien  (x.|hh)  (y.|Kj). 
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I'-'t  i:;  I l.U'iilf.ir  ,ix  limii  in  ilfcisioii  Ll’cnry.  It  asserts  tli.it  yi  /ini  two 
eont  i nya'inc  ies  il  tlie  silnation  in  the  1 i rst  cnntinj-’,‘'ncy  is  .it  ie.asL  as  y.ood 
.IS  the  sitii.it  ion  in  the  second,  no  matter  v;h  i ch  event  0011;;,  tiien  the  l 1 rot 
is  .It  least  as  desirah'e  as  the  second.  If  ior  at  letist  one  of  the  event  ., 
the  situation  is  strictly  better  in  the  first  contingency  than  in  the  second, 
then  the  first  contingency  is  strictly  preferred  to  tlie  second,  providing, 
ol  course,  that  th.it  event  i .s  not  null. 

Ihe  postul.ite  is  formulated  for  elementary  situations  r.Uiier  th.in  for 
all  raemhers  ol  X for  the  simple  reason  that  it  doi’s  not  hold  if  som*  of  the 
situations  are  themselves  contingencies.  f don't  believe  this  (act  lias  been 
noted  in  most  previous  formulations  of  subjective  probability  theory;  but 
it  is  difficult  to  be  sure  because  many  of  the  manipulations  which  depend 
on  the  logical  properties  of  sets  are  left  implicit.  The  difficulty  will 
he  illustrated  by  a simple  example.  Suppose  one  contingency  is  that  the 
individual  receives  a dollar  If  It  doesn't  rain,  otherwise  nothing,  so 
ilj  = ($1,o|R).  The  other  contingency  = (x,0|r)  where  x = ($10,0;H)  i.e., 
it  il  doesn't  rain  the  individual  will  receive  $10  if  it  rains.  Tills  com- 
plex contingency  can  be  cvalu.ited  using  I’4,  but  tiie  r€';uler  is  prohahlv  well 
aliead  of  an. I lysis.  I'tie  second  contingency  Is  worth  precisely  0 despite  the 
tact  that  X ^ 0,  In  general,  P4  holds  for  both  eiemetiL.iry  situations  and 
contingencies  in  the  case  that  the  events  in  the  primary  partition  are 
independent  of  the  events  involved  in  the  sub-contingencies;  however  the 
notion  of  independence  cannot  be  defined  with  the  conceptual  structure 
developed  up  to  this  point.  The  notion  of  independence  can  be  defined  for 
arbitrary  events  only  within  the  context  of  nunK?rtcal  probabilities,  wliicb 
we  won't  get  to  for  several  pages.  Tlie  fact  that  Independence  cannot  be 
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defined  tor  ordinal  p rohahi li ties , even  with  a fixed  U,  appears  t be  a 
deep  property  of  the  sub  jectivist  approach  which  supplements  Liie  eomintuUs 
in  Section  2 concerninp,  1 he  relativity  of  independence  to  a specilic  uni- 
verse ol  discourse. 

Hie  next  assumption  is  intended  to  give  substantiality  to  the  notion 
of  one  event  lieing  more  probable  than  another.  Suppose  you  are  offered  a 
clioice  between  two  contingencies,  Cj^  = (x,y|E)  and  C2  = (x,y|F),  where  x • v. 
Since  x is  preferred  to  y,  you  would  prefer  that  it  be  contingent  on  the  more- 
likely  event.  Thus,  if  you  feel  that  C,  is  preferable  to  C^,  this  is  i)rima- 

i i. 

facie  evidence  that  you  think  E is  more  likely  than  F.  As  an  obvious  example, 
if  Cj  is  "you  get  $10  if  a head  sliows  on  a flip  of  a coin,  othenvise  nothing" 
and  C,,  is  "you  get  $10  il  a five  shows  on  a roll  of  a die,  otlierwise  nothing," 
you  would  in  all  likelihood  select  C^. 

This  approach  to  perceived  relative  llkeiiliood  wouldn't  be  worth  much 
if  you  changed  your  feelings  depending  on  the  kind  of  reward.  1’5  is  intended 
to  assure  the  requisite  stability.  As  in  P4,  we  have  to  restrict  the  postu- 
late to  elementary  situations.  It  is  patently  false  if  asserted  for  situa- 
tions which  are  themselves  contingencies. 

P5.  Stability.  If  x,y,z,w  are  elementary  situations  and  il  x y and 
(x,y|E)  > (x,y|K),  then  if  z ■ w (z,w[e)  > (z.wjp). 

'Ilie  postulate  looks  more  complicated  than  it  is;  It  merely  asserts  that 
if  you  prefer  the  more  valuable  of  two  particular  situations  to  be  contingent 
on  tlie  event  E rather  tlian  tlie  event  F,  then  for  any  other  pair  ol  situations 
you  would  prefer  tlie  more  valuable  to  l)e  contingent  on  E. 

Savage  defends  P!)  on  tlie  grounds  that  preferences  for  contingencies 
should  not  lie  dependent  on  the  size  of  the  prizes,  no  matter  liow  small,  as 
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long  as  uru>  is  deilnitely  iirefprred  Lo  tlic  ottuT.  1 am  inclined  to  think 


that  the  question  of  absolute  size  of  prizes  is  a hit  of  a red  herring, 
comparable  to  tiie  question  wtiether  Individual's  make  C'stimates  which  are 
"really  continuous."  In  essence  P5  assures  that  tiic  following  definition 
won't  create  trouble  when  situations  are  shuffled  in  contingencies. 

1)1.  E > F (read  "E  is  at  least  as  probable  as  F")  means  that,  given 
x,y  are  elementary  events  and  x > y,  (x,y|E)  > (x,y|F). 

Tiie  use  of  the  same  symbol  > to  indicate  the  preference  relation  between 
situations  and  the  relation  mc^e  probable  between  events  should  not  be  too 
bothersome,  since  the  two  uses  will  be  distinguished  by  lower  case  letters 
for  situations  and  upper  case  letters  for  events. 

In  order  to  assure  that  D'3  is  not  empty,  it  is  necessary  to  assert  the 
trivial  assumption  that  there  is  at  least  one  pair  of  situations  x,  y such 
that  X > y.  I'm  willing  to  make  that  assumption  without  dignifying  it  with 
a i’  number. 

Pl-5  and  Dl-3  are  sufficient  to  establish  what  could  be  called  the  pur<‘ 
ordinal  theory  of  subjective  probability.  As  we  shall  see  in  a moment,  they 
determine  tor  any  two  events  E and  F tt»at  the  individual  has  a consistent 
judgment  as  to  which  is  the  more  probable.  This  judgment  lays  out  all  events 
(in  U)  in  a serial  order,  with  the  null  event  0 at  the  low  end,  and  the  uni- 
versal event  U at  the  upper  end.  As  should  be  the  case,  the  disjunction 
K V F of  any  two  events  is  at  least  as  probable  as  either,  and  either  is  at 
least  as  probable  as  the  conjunction  F..F. 

Theorem  1.  > is  a complete  ordering  for  events,  that  is,  it  is  connected 
and  transitive. 
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P^of : [,ct  F.  and  F be  any  two  events.  For  connexity,  consider 

any  pair  of  situations  x > y.  By  PI,  either  (x,y|E)  > (x,y|F)  or 
fx,y|E)  ~ (x,y|F)  or  (x,y|E)  < x,y|F).  Using  D3  the  corresponding  rela- 
tionship is  transferred  to  E and  F.  For  transitivity,  E > F and  F > G means 
tliere  is  a pair  x,y  x > y and  (x,ylE)  > (x,y|F)  and  there  is  a pair  z,w  z • w, 
and  (z,wjF)  > (z,w|g).  From  P5  we  get  (x,y|F)  > (x,yjG).  Hence  from  PI 
(transitivity)  we  conclude  (x,ylE)  > (x,ylc),  and  D3  implies  E > F. 

Tlieorem  2.  U > 0. 

Proof : Assume  x '•  y.  (x,y|U)  ~ x and  (x,y|0)  ~ y by  P 3b 

Hence  U > 0 by  D3. 

Theorem  3.  0 < E < U. 

Proof : Assuming  x > y,  (x,y | 0) ~y~ (y ,y [ E)  < (x,ylE)  < (x, x | E) -x- (x, y [u ) . 
The  equalities  are  from  P3,  the  Inequalities  from  P4. 

Theorem  4.  E v F ^ ^ ^ E.F. 

F 

Proof:  Consider  the  table 


E.F  E.F  E.F  E.F 

x X x y 


y 

y 


y 

X 

y 


y 

y 

y 


If  X > y,  then  dominates  and  C^,  which  in  turn  dominate  , all  by  P4. 
The  table  entries,  e.g.,  C2  = (x,y|E)  » (x|E.F,  x|E.F,  y|E.F,  yjE.F)  follow 
from  the  logical  rule  E = E.F  v E.F  and  P3. 

There  is  a tendency  in  subjectivist  theories  of  probability  to  use  the 
theory  ol  ordinal  probability  simply  as  a stepping  stone  to  I tie  more  familiar 
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mimiT  il'.i  1 probal)  i [ i I y . Tlii';  nuiy  he  Loo  hasty  .1  leap.  As  we  have  seen, 
numerical  probability  runs  itilo  di  t'f  iculLies  as  ; oon  as  we  try  to  move  from 
one  iiniversi'  ol  iliscoiitse  to  another.  In  addition,  as  will  crop  up  lat(i, 
nnmerical  probabilities  do  not  seem  to  rationalise  some  kinds  ol  choice 
bi'haviiir  when  the  amount  of  information  concerning  uncertain  events  becomes 
too  sparse.  We  can  ask,  does  the  set  of  postulates  and  definitions  we  have 
just  gi’iie  through  remain  valiii  for  tliese  "non-noi  ma  1 " conciitions?  We  will 
return  to  tl)is  theme  in  Chapter  IV  on  nominal  judgments. 

The  final  postulate  which  bridges  the  gap  between  purely  ordinal  proba- 
hilitv  and  numerical  probability  is  the  somewhat  more  controversial  "sure- 
thing"  principle.  This  postulate  introduces  a form  of  strong  independence 
hetwi'on  events  and  situations  that  furnishes  tlie  basis  for  the  additivity  of 
probabilities  for  exclusive  events.  To  formulate  the  postulate  in  our  nota- 
tion we  need  an  auxiliary  idea. 

1)4.  C and  1)  agree  on  E if  there  is  a partition  ^ ^ oi  E,  such  that 

C = ((x^|E.),  K|E)  and  D = ((y^^iEj^),  s|e)  and  ~ every  i.  rJe  and 

sjE  are  shorthand  for  "C  and  D can  be  anything  on  E." 

I’b.  (Sure-thing)  if  agrees  with  on  E and  C,,  agrees  wifti  C^  on  F and 

C agrees  with  C_  on  E and  C_  agrees  with  C,  on  E,  then,  it  C,  > C„ , C > C. . 

1 L 3 H i.:3‘4 

The  intention  of  P6  is  probably  clearer  displayed  in  a diagram. 

E E 


s 


r u 

t u 


If  Cj  > , then 


"....fur  here  I have  little  Interest  in  qualitative  probabilities,  except 
as  a foundat ion  for  quantitative  probability."  L.  .1.  Savage, p.  45. 
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Here  r,  s,  t and  ii  are  not  sltnat Lons , but  abbreviations  for  "whatever  pat- 
tern of  situations  obtains  for  subsets  of  E or  subsets  of  E as  the  rase  might 


The  basic  intent  of  the  postulate  is  expressed  by  the  diagram.  Whatever 

makes  preferable  to  must  involve  only  E,  because  ttie  two  are  identical 

on  E.  But  then  C.,  and  C,  are  also  Identical  on  E,  and  therefore  should  be 
3 4 

affected  by  exactly  the  same  considerations  that  made  preferable  so  . 

The  reason  for  the  name  "sure-thing"  should  be  clear. 

It  is  unfortunate  that  the  postulate  assumes  such  an  intricate  lorm. 
since  the  basic  notion  is  quite  simple.  "The  only  features  of  two  com  in- 
gencies  that  make  a difference  are  those  parts  where  they  are  different." 

P6  furnishes  the  theorem 

Theorem  A.  E ^ F if  and  only  if  E v G > F v G,  providing  E.C  = F.G  = 0. 
Pr^of : Gonsider  the  diagram,  where  as  usual  x y. 


E.F  F.F  E.F 


G.. 


G.. 


y y 

X y 

y y 

X y 


G is  Included  in  E.F  by  assumption.  E > F it  and  only  if  Cj^  > C.^  and 
EvG>Fv(;if  and  only  if  ^ C^.  Set  E.F  v E.F  = H and  K.F  v E.F  = H, 
agrees  with  on  H,  agrees  with  on  H,  agrees  with  G^  on  H, 
and  agrees  with  on  H.  Thus  the  conditions  of  P6  are  fulfilled,  and 


* 

II  i,s,t,u  are  coii.st  riu*d  as  elementary  situations,  then  Pb  is  just  a speci.il 
i.isr-  ol  PA,  dominance,  since  G,  ^ G„  implies  r > t,  wtience  G,  > . 
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il  ('  > , t Ill'll  C,,  > C , Since  the  I'oiul  i t i ons  .ire  s vminet  r ii  n I . the  reverse 

I 1 ‘t 

.1 1 he  I il'i . 

Hence  hy  0!  the  theorem  tollows. 

rheorein  4 is  the  analogue  for  ordinal  probabilities  ol  the  .uldilivicy 
ol  numerical  probabilities  for  exclusive  events.  It  permits  "cancellation'' 
of  C in  the  inequality  E v G > F v C.  An  important  corrolary  of  Theorem  4 is 
Corollary  E > F implies  F E 

Proof:  Consider  the  table,  with  x > y 

E.F  E.F  E.F  E.F 

X y y 

y X V 

y y X X 

y X y X 


X 


X 


Define  H and  H as  in  the  proof  for  Theorem  4. 

E > F implies  C^^  > C^.  C^^  agrees  with  C^  on  H and  agrees  with  on  11. 


on 


Thus,  by  Pb,  F > E. 


C_  agrees  with  C on  H and  C.,  agrees  with  C, 

^ J j 4 

Turning  to  numerical  probabif itles , the  elementary  calculus  of  proba- 
bilities is  remarkably  simple.  You  can  get  by  with  the  following  three 
assumpt ions : 

A1 . 0 < P(E) 

A2.  P(U)  = I 

A'j.  P(E  V F)  = P(E)  + P(F),  providing  E.F  = 0 

There  are  a number  of  routes  to  take  to  Al-1.  1 will  outline  what 
appears  to  be  the  simplest  of  the  procedures.  More  complete  treatments  are 
found  in  Savage  and  de  Finneti.  1)3  sugge.sts  a natural  definition  lor  the 
notion  probability  1/2. 
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D5.  P(E)  = 1/2  mt-ans,  given  x > y,  (x,y|E)  ~ (x,yjE)  i.e.,  the  individual 
is  indifferent  wliether  the  more  valuable  alternative  is  contingent  on  E or  E. 
Corollary  2.  If  P(E)  = 1/2  and  P(F)  = 1/2,  then  E - F. 

Proof;  By  corollarv  1,  if  E > F,  then  F - E.  By  definition  of  P(E)  = 1/2, 
E ~ E,  and  similarly  F ~ F,  whence  F • E,  contrary  to  assumption.  Tlie  same 
reasoning  rejects  F > E. 

Although  it  doe.s  not  seem  possible  to  detine  the  notion  of  inilependence 
for  pairs  of  arbitrary  events  within  the  ordinal  theory  of  probability,  it 
is  possible  to  define  the  notion  for  the  special  case  that  one  of  the  eviaits 
has  probability  1/2. 

■k  _ 

D6.  Given  P(E)  = 1/2,  E is  independent  of  F means  E.F  E.F. 

1)6  can  be  extended  to  express  the  notion  E independent  rt;petiti^n, 

providing  the  probability  of  E is  1/2. 

D7.  Given  P(E)  = 1/2,  E is  independent  on  repetition  means:  Let 

designate  a conjunctive  sequence  of  n terms  consisting  of  E's  and  E's  in 

any  proportion  and  any  order;  e.g.,  an  X.^  might  be  E.E.E.  For  any  n,  and 

any  X , X .E  ~ X .r.. 
n n n 

Theorem  6.  If  there  exists  an  event  E such  that  P(E)  = 1/2  and  E is 

independent  on  repetition,  then  there  exists  a unique  iiuapping  o'  U onto 

the  real  interval,  such  that  A1-A3  hold. 

Proof:  ft  is  elementary,  but  tedious,  to  prove  that  the  hypothesis  of 

the  theorem  implies  there  is  a 2'^-fold  equipartltion  of  U for  every  Integer 

n.  Denote  a member  of  the  2'^-fold  equipartltion  by  X^ , and  the  logical  sum 

of  any  m of  these  by  X . For  any  F,  either  F --  X for  some  n and  m,  or 

i),m  ^ ' n,m 

* 

This  dellnilion  can  he  related  to  I hi-  usual  delinltion  o1  independence, 
nainelv  PU-.E)  “ P(E)P(K),  by  noting  that  Independence  implies  P(E)  ~ 

P(E)I’(F),  and  it  P(E)  = l’(E),  we  arrive  at  Of). 


72 


tluTi’  is  nil  infiiiiti'  sefiuoiict'  of  intoivals  X , • 1-  - X . Define  P(Fi 

n , m+ 1 n , m 

lo  bi’  ni/;’"  ill  tlie  lirst  inslniue,  otlirrwise  r.be  limit  ot  Llie  sequence  of 
intervals  ns  n ' ■.  Tliis  definition  maps  U oiUo  the  real 

inlervnJ  (0,i),  with  P(H)  = 1.  The  mapping;  is  unique  in  the  sense  that  for 
any  two  events  I lial  have  probability  i/2  and  are  independent  on  repetition 
the  identical  mapping  is  generated.  A3  follows  from  Theorem  A and  the  addi- 
tivity of  the  reals  defined  as  limits  of  sequences  of  intervals. 

Theorem  5 motivates  the  assumption 

P7.  There  is  an  event  E,  P(E)  = 1/2,  and  E is  independent  on  repeti- 

* 

t ion . 

Although  Theorem  5 is  in  some  sense  an  adequate  basis  for  numerical 
probabilities,  it  does  not  assure  a tot;il  fit  between  ordinal  probabilities 
and  t hi'  numerical  mapping.  It  implies  that  if  l'(E)  • P(F)  then  F.  F,  but 
not  the  reverse.  In  particular,  it  does  not  exclude  E > F and  l'(E)  = P(F). 
fo  rule  out  this  possibility,  an  additional  assumption  is  needed.  This 
assumption,  ruling  out  Infinitesimal  differences  in  probabilities,  is  com- 
mon in  me.asu rement  theory. 

P8.  (Archimedean)  If  E > F,  then,  for  C such  that  P(C)  = 1/2,  indepen- 
dent on  repetition,  there  is  an  X , such  that  E > X > F. 

n,m  n,m 


Most  investigators  in  the  foundations  of  probability  would  probably  find  P7 
overly  specialized,  "weak,"  and  possibly  old-lashioned . The  s.ime  may  be 
true  of  PH,  below.  I have  preferred  these  two  to  more  powerful  assunipt ions 
on  the  grounds  that,  givc*n  Theorem  A (existence  of  an  ordinal  probability 
scale),  the  basic  issue  appears  to  be  attaching  a numerical  scale  to  proba- 
bilities with  some  psychological  content.  P7  and  PH  seem  to  express  -idely 
accepted  attitudes  about  events  like  observing  a head  on  the  flip  of  a 
coin. 
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P8  assures  that  the  strict  inequality  E > F means  there  is  a finite 
difference  between  E and  F.  It  also  assures  that  E and  F will  be  mapped 
onto  different  numbers. 

This  is  probably  a good  place  to  list  several  definitions  and  formulae 
that  can  be  derived  from  A1-A3  which  will  be  used  in  later  sections. 

D7.  Relative  probability.  P(e|F)  = P(E.F)/P(E) 

Read  "The  probability  of  E given  that  F occurs." 

D8.  Independence . E and  F are  independent  means  P(E.F)  = P(E)P(F), 
or  equivalently,  P(e|F)  = P(E). 

FI.  Extended  rule  o^  addition.  P(E  v F)  = P(E)  + P(F)  - P(E.F) 

F2.  Rule  of  the  Product.  P(E.F)  = P(E)P(f|e)  = P(F)P(e1f) 

F3.  of  elimination.  if  is  an  exclusive  and  exhaustive 

partition  of  U,  P(E)  = ZP (F . )P (E | F, ) 

I ^ ^ 

F4.  Theorem^  Bayes . If  {H^}  Is  an  exclusive  and  exhaustive  partition 

P(H.)P(E|H.) 

of  U.  P(H^|E)  - 

j J J 

There  is  a certain  reluctance  to  accept  idealizations  like  an  event  with 
probability  1/2  Independent  on  repetition  as  a basis  for  probability  measure- 

A 

mcnts.  Idealizations  in  other  areas  of  measurement  are  not  so  suspect  — per- 
fectly rigid  and  indefinitely  divisible  rods,  isochronous  clocks,  and  the 
like.  I would  suspect  tii.it  the  reason  is  not  so  much  the  idealization  as  the 
fact  that  in  practice,  probabilities  are  not  measured  by  comparison  with  some 
set  of  equiprobable  events,  but  rather  are  measured  by  relative  trequencies. 

*11 

it  might  fairly  be  objected  th.at  such  a postulate  would  lx-  llagrantly  ad 
hoc."  Savage,  Foundations,  p.  33. 
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There  is  no  standard  chance  device,  e.g.,  a plat inum-i ridium  penny,  at  the 
international  Bureau  of  Weights  and  Measures  at  Sevres,  France. 


The  world  described  by  postulates  P1-P8  is  a very  simple  place.  It  is 
essentially  the  world  of  the  gambler,  where  the  interesting  events  are  loosely 
coupled  to  the  interesting  rewards.  In  gambling,  the  coupling  is  effected  by 
a social  contract,  not  by  physical  interaction.  To  dramatize  this  point, 

P3  says  that  the  contingency  (x,x|e)  exists,  whatever  x and  whatever  £.  But 
suppose  E is  "The  sun  goes  nova  tomorrow."  What  possible  contingency  could 
there  be  that  makes  the  outcome  of  the  sun  going  nova  the  same  as  the  sun 
not  going  nova? 

Although  there  have  been  attempts  to  model  the  "real"  world,  where  events 
Influence  the  relevant  outcomes  in  a direct  physical  way,  to  my  knowledge 
none  of  these  have  been  successful.  Savage  begins  his  theory  with  something 
that  looks  very  much  like  the  real  world,  but  he  has  to  abandon  it  rather 
quickly.  He  needs  the  notion  of  "constant  act"  — that  is,  an  act  that  pro- 
duces the  same  consequences  irrespective  of  the  state  of  the  world  — in  order 
to  formulate  the  equivalents  of  PA  and  P6.  Thus,  his  definition  of  ordinal 
probability  is  formulated  within  the  (unexpressed)  restrictions  of  an  assump- 
tion tliat  is  very  much  like  P3. 

The  situation  in  probability  theory  is  not  too  different  from  what  it  is 
in  many  other  measurement  theories.  It  is  recognized  by  physicists,  for 
example,  that  the  elementary  definition  of  length  in  terms  of  juxtaposing  a 
sequence  of  equal-length  rigid  rods  is  feasible  only  in  a limited  geographical 
region.  To  measure  lengths  over  more  extended  regions,  e.g.,  to  the  planet 
Mars,  complicated  apparatus  and  complicated  ttieories  must  be  invoked.  To 


extend  measures  to  Intergalact ic  distances,  rather  shaky  assumptions  concern- 
ing the  period  and  intrinsic  brightness  of  variable  stars  must  be  made. 

There  is  no  reason  for  suspecting  that  the  world  is  any  more  tractable 
when  it  comes  to  probability  measurements.  Defining  the  elementary  notion 
of  probability  measure  in  terms  of  gambling-like  situations  does  not  imply 
that  the  same  type  of  measurement  extends  to  any  situation  where  we  would 
like  to  use  the  term  probability. 

The  subjective  theory  of  probability  goes  well  beyond  simple  estimation 
of  probabilities  and  includes  a relatively  complete  theory  of  individual 
decisions.  The  extension  of  subjective  probability  theory  to  include  numeri- 
cal utilities  is  a relatively  minor  step,  and  in  fact  in  the  form  developed 
by  Ramsey  and  De  Finneti,  the  theory  of  numerical  utilities  precedes  and 
forms  the  basis  for  the  finalization  of  a numerical  theory  of  probabilities. 
The  intimate  tie  between  subjective  probability  theory  and  the  complete 
theory  of  decisions  is  both  a strength  — it  allows  displaying  the  role  of 
probabilities  in  decisions  in  a simple  way  — and  a source  of  awkward  con- 
sequences. If  human  decisions  as  observed,  e.g.,  in  the  psychological  lab- 
oratory, do  not  accord  with  the  theory,  it  is  not  always  clear  whether  the 
disparity  Involves  the  narrower  concept  of  perceived  probabilitv,  or  the 
more  general  theory  of  decision  in  which  it  is  embedded. 

As  presented  by  its  founders,  and  most  of  those  who  want  to  apply  it, 
the  subjective  theory  has  been  given  a kind  of  universality  which  is  unneces- 
sary lor  our  purposes.  Thus,  for  true-blue  sub ject ivl a t s , any  individual  (at 
least  any  one  beyond  tlie  age  of  accountability)  has  a clear  perception  of  the 
probability  (from  his  perspective)  of  any  event  whatsoever.  F\ir t hermore , 
this  probability  is  precisely  the  "correct"  probability  to  guide  any  of  li  i s 
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decisions,  to  which  the  probability  in  question  is  relevant.  Both  of  these 
appear  to  be  unnecessarily  strong  assumptions.  I'or  our  purposes,  it  looks 
suiiicient  to  assume  that  for  some  universes  of  discourse,  individuals  have 
relatively  clear  perceptions  of  the  probability  distributions  on  those  uni- 
verses. Wliether  the  perceived  probabilities  are  "correct"  is  a quite  dif- 
ferent matter.  We  assume  they  can  be  incorrect  in  much  the  same  sense  in 
which  guesstimates  of  any  other  physical  quantity  can  be  incorrect. 

6 . Calibrat ion 

The  subjectivist  theory  of  probability  is  essentially  a theory  of  con- 
sistent probability  estimates.  The  tie  between  the  estimates  postulated  by 
the  theory  and  reality  is  loosely  drawn.  By  and  large  those  who  whole- 
heartedly embrace  the  theory  become  restive  — if  not  downright  surly  — when 
the  subject  of  correctness  of  probability  judgments  is  raised.  In  part  this 
appears  to  involve  a feeling  that  probabilities  are  not  part  of  the  world, 
but  are  measures  in  some  not  fully  specified  sense  of  the  amount  of  informa- 
tion which  an  Individual  has  concerning  the  predicted  event.  Clearly,  two 
different  individuals  with  different  information  may  announce  quite  different 
est invites  of  the  probability  of  a given  event.  There  is  no  pathology  in 
this  — it  is  analogous  to  the  fact  that  P(E|F)  may  be  quite  different  from 

l■(E|(:). 

Nevertheless,  there  is  a straightforward  sense  in  which  an  individual 
can  simply  be  mistaken  in  making  a probability  iudgment.  Almost  everybody 
is  agreed  that  the  French  mathematician  D'Alembert  was  mistaken  in  asserting 
that  the  probability  is  one-third  of  obtaining  a head  and  a tail  in  two 
tosses  of  a fair  coin.  Furthermore,  if  he  had  bet  on  that  assumption,  every- 
one is  agreed  that  his  shirt  would  have  been  in  jeopardy.  Tlie  problem  is 
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to  find  a clear  and  general  way  to  state  what  Is  meant  by  saying  an  individ- 
ual is  correct  or  incorrect  in  making  a probability  judgment.  If  the  weather 
forecaster  says  that  the  probability  of  rain  tomorrow  is  .2,  and  it  pours, 
what  crime  can  we  accuse  him  of?  He  didn't  say  it  wouldn't  rain,  just  that 
it  was  unlikely.  And  on  almost  any  concept  of  probability,  unlikely  things 
must  happen  once  in  a while.  This  topic  is  explored  more  thoroughly  in  the 
following  chapter  on  scoring. 

A different  approach  to  this  issue  that  lias  received  a fair  amount  of 
attention  lately  is  the  notion  of  calibration.  Given  a set  of  probability 
estimates  by  an  individual,  and  a set  of  data  concerning  the  occurrence  or 
nonoccurrence  of  the  predicted  events,  it  is  possible  to  get  a rough  idea 
how  good  the  individual's  predictions  are.  Thus,  if  a subset  of  the  pre- 
dictions is  selected,  all  predicting  some  event  with  the  same  probability 
R,  then  over  many  such  predictions,  the  relative  frequency  F with  which 
those  events  occur  should  settle  down  to  about  R.  More,  generally,  if  the 
individual  generates  many  estimates  with  different  probabilities,  the  rela- 
tive frequency  with  which  the  events  occur  should  approximate  t lie  dotted 
45  degree  line  in  Figure  8. 

In  actual  experimental  studies,  the  observed  results  are  usu.illy  quite 
far  from  the  theoretically  "correct"  45  degree  line.  In  Figure  9 the  solid 
line  is  a plot  of  the  data  collected  by  Capen.^^  The  subjects  were  si 
engineers.  Fach  subject  answered  a set  of  120  questions.  Responses  con- 
sisted of  a true-false  judgment  and  a probability  estimate  that  the  selected 
re.spon.se  was  correct.  Responses  were  restricted  to  the  round-numliers 
. 5 , . 6 , . . . , , 9 , 1 . 0 ; the  ri'Striction  to  responses  greater  t h.in  or  isjiuil  to  .5 
lesulted  from  the  assumption  that  a subject  would  select  t lie  .1 1 t ern.i  I t ve 
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FIR)  PROPORTION  CORRECT 


that  had,  for  him,  the  higher  subjective  probability.  The  questions  were  a 
mixture  of  professionally  relevant  and  general  information  types.  The  graph 
shows  the  average  proportion  of  correct  responses  to  total  number  of  responses 
with  a given  probability. 

Figure  9 is  an  average  over  the  43  subjects,  and  is  somewhat  smoother 
than  what  is  observed  for  a single  subject.  Surprisingly  erratic  data  can 
be  observed  with  a single  subject.  Figure  10  is  an  example. 

A quick  glance  at  Figure  9 Indicates  that  the  engineers  are  not  doing 
very  well  in  making  probability  estimates.  As  we  shall  see  later,  the 
majority  of  the  subjects  would  have  done  better  if  they  had  expressed  com- 
plete ignorance  about  every  question  — i.e.,  if  they  had  always  estimated 
.5.'  For  estimates  of  .5  and  .6  the  average  proportion  correct  is  not  signif- 
icantly different  from  the  theoretical  proportions;  but  for  .7  and  greater, 

*^he  average  proportion  is  much  smaller  than  the  estimates. 

The  term  "calibration"  has  been  used  in  two  related  senses.  In  one 
sense,  the  term  has  been  used  to  refer  to  the  fact  that  an  F(R)  curve  like 
the  solid  curve  in  Figure  9 has  been  observed  (or  estimated  by  someone  else) 
for  the  individual.  In  this  sense  the  individual  has  been  calibrated  in 
much  the  same  way  that  an  Instrument  is  calibrated  when  its  response  curve 
to  the  quantity  it  is  intended  to  measure  is  known.  In  the  oi.her  sense, 
an  individual  has  been  called  calibrated  if  his  F(R)  curve  has  been  observed 
to  lie  on  the  45  degree  line.  Sometimes  the  term  "fully  calibrated"  is 
used  for  this  meaning. 

Another  term  for  the  observed  F(R)  curve  is  the  Realism  curve. An  Indi- 
vidual Is  called  "realistic"  it  Ills  curve  matches  the  tlu'oretlcally  correct 
curve,  otherwise  unrealistic  to  the  extent  It  departs  from  the  theoretical. 
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Figure  10.  Individual  Calibration  (Data  From  Capen,  Subject  29  120  Questions) 


Figure  9 is  fairly  typical  of  the  results  obtained  by  many  investi- 
gators in  this  field. In  general,  there  is  a tendency  for  subjects  to 
fall  below  the  45  degree  line  for  estimates  exceeding  .5,  and  to  remain 
above  it  for  estimates  less  than  .5.  Some  difference  in  conventions  of 
counting  have  led  to  apparent  discrepancies  with  this  finding.  Given  a 
sequence  of  sentences,  and  the  corresponding  sequence  of  estimates  of  the 
probability  that  the  sentences  are  true,  the  relative  frequency  of  true  can 
be  plotted  against  the  estimate.  This  will  produce  one  kind  of  F(R)  graph. 
However,  any  given  estimate  can  be  interpreted  as  a pair  of  estimates  — one 
estimate  of  the  probability  that  the  sentence  is  true,  and  an  implied  estl- 
m.ate  of  the  probability  tliat  the  sentence  is  false.  If  the  Individual  is 

consistent  in  his  estiimites,  then  F(l-R)  = 1 - F(R),  and  the  curve  must  be 

★ 

skew  symmetric  about  .5. 

The  general  pattern  exemplified  by  Figure  9 has  often  been  character- 
ized as  indicating  that  the  individual  overvalues  his  knowledge  — i.e.,  that 
lie  is  overconfident.  Thus,  when  he  says  .8,  he  really  has  "grounds"  only 
for  saying  .6.  When  he  says  1.0,  he  is  justified  only  in  saying  .85,  etc. 
This  mode  of  speaking  is  fraught  with  semantic  traps.  Thus,  does  he  "over- 
value" his  information  for  estimates  less  than  .5?  But  as  we  have  seen  an 
estimate  of  less  than  .5  is  always  coupled  with  estimate  greater  than  .5. 

An  even  more  Intricate  snare  will  be  discussed  later,  after  introducing  the 
theory  of  errors  approach  to  estimation. 

* 

This  convention  can  lead  to  verbal  puzzles  at  .5.  In  particular  it  implies 
that  an  individual  is  always  completely  realistic  for  the  estimate  .5.  If 
an  individual  is  presented  with  a set  of  sentences,  all  of  which  are  true, 
and  he  responds,  "The  probability  that  this  sentence  is  true  is  .5"  to 
every  sentence,  he  is  right  one-half  the  time. 


The  suggestion  has  been  made  that  the  realism  curve  be  used  as  a method 


of  generating  objective  probabilities  from  the  Individual's  subjective 
reports.  The  F(R)  curve,  on  this  suggestion,  can  be  used  to  correct  the 
individual's  estimates.  At  first  glance,  this  looks  like  a fairly  attrac- 
tive idea.  As  mentioned  above,  the  F(R)  curve  might  be  thought  of  as  a kind 
of  "theory  of  the  instrument"  of  an  individual  making  probability  judgments. 
Various  nonlinear  relationships  have  been  observed  by  psycliologists  between 
physical  stimuli  and  perceived  magnitudes.  Offhand,  there  is  no  reason  why 
there  should  not  be  a nonlinear  scaling  between  objective  probabilities  and 
"perceived"  probabilities.  Presumably,  such  a relationship  could  be  used 
to  rescale  the  subjective  probability  estimates. 

In  order  for  this  scheme  to  have  any  value,  the  observed  F(R)  curve 
must  be  a stable  property  or  trait  of  the  individual.  That  is,  over  a 
fairly  wide  variety  of  types  of  questions  and  circumstances,  the  observed 
relative  frequencies  must  be  roughly  the  same  for  the  same  reported  proba- 
bilities. There  is  a fair  amount  of  evidence  that  this  is  probably  not  the 
case;  that  the  degree  of  realism  is  a function  of  tlie  type  of  question  being 

it 

asked . 

A mucli  more  serious  objection  to  using  empirical  F(R)  curves  for  rescal- 
ing probability  estimates  is  presented  by  the  fact  that  probabilities  are 
absolute  scales,  and  allow  no  transformations.  This  statement  appears  to 
be  significant  enough  to  warrant  being  stated  as  a theorem. 

Theorem  6.  If  P is  a probability  measure  on  the  event  space  U,  then 
there  is  no  function  F(P)  ^ P,  which  is  also  a probability  measure  on  U. 

it 

”i  >e  the  discussion  of  "hard"  questions  in  Chapter  IV. 
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Proof : Let  iE.}  be  an  m-fold  equipartit ion  of  U,  i.e,  P(E.)  = P(E  ) = 

J J 

1/m  for  all  J and  k.  Let  F be  any  function  of  P.  We  have  F(P(E.))  = F(l/m). 

If  F is  a probability  measure,  since  the  partition  is  exhaustive, 

P(K.)  = 1 = mF(l/in).  Whence,  FCl/m)  = 1/m.  Consider  any  n < m of  the  . 

P(v  E.)  = n/m.  Since  F(P)  is  a probability  measure,  and  the  E.  are  exclusive, 
J 3 3 

F(P(b  Ej))  = ^ F(P(Ej))  = n/m  = F(n/m).  Since  an  real  number  can  be  approxi- 
mated by  a sequence  of  rationals,  if  F is  continuous,  F(x)  = x for  any  real 
number  x.  The  proof  requires  that  U contain  m-fold  equlpart it i ons  for  arbi- 
trarily large  m.  This  condition  is  assured  by  P7. 

There  are  several  ways  Theorem  6 can  be  viewed  with  respect  to  calibra- 
tion. Suppose  P is  the  actual  probability  measure  on  U — i.e.,  P is  the 
process  which  generates  the  occurrence  or  nonoccurrence  of  the  events  tab- 
ulated to  give  the  relative  frequency  F(R),  where  R is  the  individual's 
subjective  probability  measure  on  U.  If  R differs  from  P,  then  there  is  no 
stable  relationship  between  P and  R.  In  other  words,  two  sequences  E^  and 
F^  can  be  selected  out  of  U,  each  with  the  same  estimated  probability,  and 
each  with  different  probabilities  of  occurrence. 

Another  way  to  view  the  theorem  is  the  following:  if  an  Individual  is 

a consistent  probability  estimator,  then  no  rescaling  of  his  estimates  is 
also  consistent.  On  the  other  hand,  if  the  individual  is  not  consistent, 
then  his  estimates  cannot  be  used  with  confidence,  rescaled  or  not.  To  make 
the  dilemnvi  clear,  suppose  we  are  Interested  in  P(E  v F)  where  we  know  E and 
F are  exclusive.  There  are  two  ways  we  can  obtain  tills  estimate;  (1)  Let 
the  individual  estimate  P(E)  and  P(F),  rescale  these,  and  take  the  sum. 

(2)  Let  the  individual  estimate  P(E  v F)  and  rescale  this  estiamte.  If  t lie 
individual  is  not  fully  realistic,  tiiese  two  procedures  will  generally  give 
two  quite  dilterent  .lumbers.  Wliicii  is  tlie  best  estimate? 
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If  only  one  number  is  required  — i.*'.,  if  there  is  no  intention  of 
using  tiie  indi viduitl ' s probability  judgment  in  further  computations  — then 
possibly  a rescaling  procedure  could  be  justified,  where  liie  individual 
estimates  precisely  the  probability  desired.  However,  in  almost  all  inter- 
esting applications,  various  manipulations  of  estimates  are  needed  to  com- 
plete the  analysis. 

In  a way,  this  result  is  somewhat  disappointing,  since  the  notion  of 
calibration  appeared  to  be  a way  of  tying  probability  estimates  to  reality. 
However,  in  another  way  the  result  is  comforting.  It  says  flatly  Chat  realism 
curves  define  objective  probabilities  if  and  only  if  the  individual  is  fully 
realistic.  The  question  of  how  to  proceed  if  an  individual  is  not  "reason- 
ably realistic"  will  be  pursued  in  Chapter  IV. 

7 . Tlieory  of  Errors  Model 

The  theory  of  errors  is  perhaps  the  most  widely  used  of  the  estimation 
models  in  experimental  psychology.  It  is  most  often  applied  to  simple  magni- 
tude estimates,  but  in  theory  applies  to  any  quantifiable  judgment.  In 
elementary  form  the  model  assumes  that  an  estimate  has  two  components,  a 
stable,  non-variable,  component,  and  a random  error  component.  For  estimates 
where  a correct  or  true  response  is  definable,  it  is  usually  assumed  that  the 
stable  component  is  the  true  answer,  and  any  given  response  of  an  individual 
I Is  the  sum  of  that  true  answer  and  a random  perturbation,  l.e., 

R = T + C (6) 


In  a previous  publication  I was  ambiguous  concerning  the  notion  of  calibra- 
tion as  a foundation  for  a theory  of  group  estimation.  Theorem  b pretty  well 
clears  up  the  ambiguity;  calibration  is  an  Insubstantial  foundation.  The 
earlier  formalism  Is  still  valid.  The  notation  P(e|r)  must  he  Interpreted  as 
the  probability  that  the  event  F,  will  occur,  given  that  tlie  Individual  asserts 
R,  and  cannot  be  Interpreted  as  F(R)  derived  from  some  calibration  curve. 
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The  source  of  the  random  perturbation  C is  usually  not  identified;  it 
is  assumed  that  variable  factors  both  in  the  immediate  environment  and  in 
the  internal  estimation  process  lead  to  variability  in  the  individual's 
response.  The  theory  consists  primarily  in  characterizing  the  properties 
of  this  variation.  It  is  usually  assumed  that  the  random  error  component 
has  a mean  of  zero,  and  its  value  on  any  given  response  is  independent  of 
its  value  on  any  other  response. 

In  addition,  it  is  often  assumed  that  the  random  error  is  normally  dis- 

2 2 

tributed,  i.e.,  it  has  the  density  function  <))(C)  = l/^J2^na  Q where  O 

is  the  standard  deviation  of  the  error.  Thus  the  total  response  R is  nor- 
mally distributed  with  mean  equal  to  T.  To  this  extent,  the  theory  is  quite 
analogous  to  the  theory  of  a fallible  instrument  in  the  physical  sciences. 

For  certain  kinds  of  estimates  such  as  those  involved  in  psychophysical 
measurements,  it  appears  feasible  to  replicate  the  estimates  so  that  direct 
observational  verification  of  the  assumptions  can  be  made;  the  shape  ol  the 
distribution  dan  be  determined,  and  parameters  like  the  mean  and  standard 
deviation  can  be  computed.  However,  for  the  kind  of  estimate  we  are 
interested  in,  direct  observation  of  random  errors  is  difticult.  The  indi- 
vidual Is  likely  to  remember  his  previous  answer,  and  thus  the  basic  assump- 
tion of  Independence  on  replication  does  not  hold.  Statements  concerning 
random  error  have  to  be  made  indirectly,  based  on  the  consequences  of  assump- 
tions about  the  form  of  the  random  variability.  For  this  reason,  some  inves- 
tivTtoro  prefer  terms  like  "residual  variability"  or  "unexplained  variation." 
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To  make  Che  theoiry  applleable  to  the  kind  uf  dat  i that  Is  obtained  in 


exi>er  i mint  s with  nneeitain  ijiiestlonH,  an  addition  leature  la  needed,  namcLv 
a bias  component.  J’hus,  an  individual  lesiJoti.se  la  considered  to  be  the  sum 
of  three  factors 

R=T  + B+  C (7) 

The  bias  term,  B,  like  T is  a stable  component  — i.e.,  it  is  constant 
for  a given  individual  and  a given  question.  Equation  (7)  is  equivalent 
to  the  assumption  thiit  the  individual  "selects"  his  response  out  of  a dis- 
tribution that  is  centered  around  some  mean  that  is  displaced  by  the  bias 
from  the  true  response,  as  illustrated  in  Figure  11. 

The  notion  of  bidC  has  not  received  as  much  attention  in  psychological 
literature  as  the  notion  of  random  error  (they  are  often  lumped  together 
as  "error");  a simple  Illustration  from  a physical  situation  may  make  the 
idea  clearer.  Suppose  there  is  a marksman  firing  at  a target  who  has  not 
compensated  adequately  for  windage  or  distance.  His  pattern  of  shots  might 
look  like  the  dots  in  Figure  12,  which  are  clustered  about  a point  displaced 
from  the  center  of  the  target.  The  displacement  illustrated  by  the  solid 
line  in  the  figure  is  the  bias  of  the  pattern;  the  offset  from  tlu  ceiite: 
of  the  pattern,  illustrated  by  the  dashed  line,  is  the  random  error  ; 
specific  shot  labelled  R.  It  should  bo  clear  from  this  11  lust r ' 
the  notions  of  bias  and  random  error  are  idealizations  tlu  " 
influences"  such  as  wind  and  adjustment  for  distancf-  ar. 
stant  throughout  the  trial. 

* 

Figure  12  can  be  used  to  illustrate  a . . 
requiring  Judgment.  Consider  ar.  indivMu 
equally  large  random  error.  These  tw 
clsely  correct.  If  this  occurs  in 
vidual  can  be  iredlten  with 


If  the  bias  is  unknown,  the  process  appears  to  be  a random  selection 
of  a response  R out  of  a distribution  with  mean  M where  M = T + B.  In 
Figure  11,  B is  negative. 

In  Figure  11  the  abscissa  is  labeled  R.  The  distribution  is  of  responses, 
not  of  the  quantity  being  estimated.  The  question  arises  whether  R is  scaled 
in  the  same  manner  as  the  target  quantity.  There  has  been  an  intensive  study 
for  over  a century  of  the  nonlinear  relationships  between  physical  and  psycho- 
logical magnitudes.  For  sensory  modalities,  such  as  perceived  intensity  of 
sound,  perceived  weight,  and  the  like,  the  relationship  between  the  psycho- 
logical scale  and  the  physical  scale  q (for  quantity)  ip  = f(q)  is  usually 
nonlinear.  There  has  been  a lively  and  continuing  debate  as  to  whether  there 
is  a general  form  for  the  psychophysical  function,  and  whether  it  is  loga- 
rithmic as  postulated  in  the  pioneering  Weber-Fechner  studies,  or  a power 
law  as  urged  by  S.  S.  Stevens,  or  some  more  general  realtlonship. 

To  my  knowledge,  there  has  not  been  a similar  investigation  of  the  scal- 
ing question  with  respect  to  non-sensory  estimates  of  uncertain  numbers. 
Examination  of  the  data  which  have  been  generated  in  experiments  with  such 
estimates  leads  to  what  could  be  called  the  psychonumeric  hypothesis:  Indi- 
viduals estimating  an  uncertain  quantity  tend  to  scale  their  res  lonses  on 
the  logarithm  of  the  quantity.  This  statement  may  seem  a liiile  bizarre, 
since  the  language  in  which  responses  are  expressed  is  often  the  same  as  the 
language  used  to  describe  the  physical  quantity.  In  what  sense  can  we  say 
that  100  "seems"  twice  as  large  as  10  rather  than,  as  arithmetic  requires, 
ten  times  as  large? 

Rather  than  trying  to  resolve  the  semantic  puzzles  generated  by  this 
kind  of  talk,  it  is  probably  less  mystifying  to  look  at  some  of  the  data 
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whirl)  suggests  the  hypothesis.  Figure  13  shows  the  d 1 s t r 1 hut  ion  ot  several 

thousand  responses  of  student  subjects  to  numerical  questions  like  "How  many 

19 

telephones  are  there  in  Africa?"  The  size  ol  the  numbers  being  estimated 
covered  a wide  range  — from  "How  many  appointments  have  been  made  to  the  U.S. 
Supreme  Court  since  1930  (as  of  1969),  answer  20,  to  "How  many  gallons  of 
beer  were  produced  in  the  U.S.  in  1964,"  answer  3 billion  193  million. 

To  make  the  responses  comparable,  they  were  transformed  in  the  follow- 
ing fashion:  z scores  were  computed  for  the  logarithms  of  the  responses, 

wiiere  the  z score  is  computed  as 
z = (log  R - m)/s 

m is  the  mean  of  the  log  responses  (on  a given  question)  and  s is  the  standard 

deviation  of  the  log  responses  (again  for  the  same  question),  i.e., 

I V*  2 1 52 

s = (loji  Rj^  - m)  , m “ — ^ log  R^.  Figure  13  displays  the  distribu- 

tion of  e^. 

The  smooth  curve  in  Figure  13  is  the  log  normal  density  function,  i.e., 
it  is  the  distribution  that  would  be  expected  if  the  logarithms  of  the 
responses  were  normally  distributed.  As  can  be  seen  from  the  figure,  the 
log  nornu'il  distribution  is  a very  good  approximation  to  the  data. 

The  skewness  of  the  distribution  of  responses  is  to  be  expected  on  the 
grounds  that  all  of  the  responses  have  a natural  lower  bound  — all  the  ques- 
tions involved  answers  greater  than  zero  — but  no  natural  upper  bound.  And 
the  precise  shape  of  the  distribution  can  be  explained  by  other  assumptions 
than  the  psychonumeric  hypothesis.  However  there  is  other  evidence. 

Figure  14  displays  the  standard  deviations  of  the  log  responses  graphed 
against  the  logarithm  of  the  true  answer.  The  true  answer  expresses  the 
"size"  of  the  number  being  estimated.  The  standard  deviation  has  the  property 
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Figure  14.  Average  Standard  Deviation  as  a Function  of  Log  True 


that  it  is  invariant  under  a translation,  i.e.,  s(x  + a)  = s(x),  where  a is 
any  constant.  Similarly,  the  standard  deviation  of  the  logarithm  of  a set 
of  responses  is  invariant  under  multiplication  by  a constant,  i.e., 
s(log  a x)  = s(log  x) . If  the  subjects  were  scaling  their  responses  on  the 
physical  numbers,  we  would  expect  the  curve  in  Figure  14  to  be  flat,  rather 
than  rising  with  the  value  of  log  T.  Again,  there  are  alternate  explanations 
of  Figure  14  other  than  the  psychonumerlc  hypothesis,  and  it  can  be  considered 
only  one  piece  of  supporting  evidence. 

Figure  15  shows  a plot  of  the  average  log  error  against  log  T.  log 
error  = | log  R - log  t|  where  the  vertical  bars  represent  absolute  value, 
i.e.,  taking  the  error  without  sign.  Again,  the  log  error  increases  roughly 
linearly  with  log  T.  There  is  no  theory  at  present  which  ties  the  size  of 
the  error  with  the  size  of  T.  If  however,  we  assume  that  error  scales  with 
the  size  of  the  number  being  estimated  — if  the  answer  to  question  A is  twice 
as  large  as  the  answer  to  question  B,  then  on  the  average,  the  error  In 
estimating  A will  be  twice  the  error  in  estimating  B — then  the  curve  in 
Figure  15  should  be  flat,  which  it  Is  not. 

A somewhat  more  Interesting  direction  in  which  to  explore  the  question 
of  scaling  comes  from  examination  of  the  distribution  of  digits  in  responses. 
One  of  the  intriguing  features  of  statistical  tables  such  as  are  found  in 
almanacs  and  similar  reference  works,  is  the  distribution  of  the  first  digits 
of  the  numbers.  There  is  a tendency  for  the  first  digits  to  be  distributed 

in  a logarithmic  pattern;  specifically  the  frequency  of  digit  d (d  = 1,...,9) 

20 

is  roughly  proportional  to  log(d  + 1)  - log  d.  A somewhat  more  general 
hypothesis  would  be  that  the  tabulated  numbers  x are  themselves  illstributed 

as  1/x.  This  would  imply  that  not  only  the  first  digits  but  the  second  and 

♦ 
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subsequent  digits  would  also  have  the  appropriate  distribution  — e.g.,  the 

10 

frequency  of  d as  a second  digit  would  be  proportional  to  llog(10i  + d + 1) 

i = l 

- log(10i  + d)].  T haven't  had  the  opportunity  to  verify  this  hypothesis  in 
detail  a relatively  large  body  of  data  would  be  needed  to  generate  stable 
statistics  — but  a quick  try  of  a few  thousand  numbers  selected  more  or  less 
at  random  out  of  an  almanac  looks  favorable.  Figure  16  shows  the  distribution 
of  second  digits  so  obtained.  The  dotted  lines  shows  the  theoretically 
expected  frequencies. 

More  relevant  to  estimation,  when  the  several  thousand  responses  to  esti- 
mation questions  referred  to  above  were  analyzed  in  terms  of  the  distribution 
of  first  digits,  they  exhibited  precisely  the  same  logarithmic  distribution 
as  the  data  from  the  almanac  tables.  The  distribution  of  the  first  digits 
of  the  responses  is  given  in  Figure  17.  The  only  major  departure  from  the 
theore*^ical  distribution  is  an  evident  preference  for  the  digit  5. 

One  might  be  tempted  to  believe  that  the  distribution  of  first  digits 
in  the  responses  is  being  "driven"  by  the  corresponding  distribution  in  the 
true  answers  which  the  subjects  are  trying  to  approximate.  Indeed,  the  true 
answers  exhibit  the  logarithmic  distribution.  However,  the  two  distributions 
are  completely  Independent.  Whatever  psychological  mechanism  generates  the 
distribution  of  responses,  it  is  not  tied  to  the  mechanism  that  generates  a 
logarithmic  distribution  of  first  digits  in  the  almanac  tables. 

The  distribution  of  first  digits,  then,  is  partial  confirmation  of  tiie 
hypothesis  that  the  real  number  system  in  the  minds  of  the  respondents  is 
distributed  like  1/x. 

Some  additional  supporting  evidence  for  the  psychonumeric  hypothesis 
will  be  discussed  in  the  section  dealing  with  group  judgment  and  tiie  theory 
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DIGIT 


Figure  17.  Distribution  of  First  Digits,  Subject  Responses  (5,037  Resjionses) 
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ol  errors,  Cliapter  V.  Hut  to  sum  up  what,  we  h.ive:  if  we  assume  that  the 

individual  scales  his  responses  on  the  logarithm  of  rtie  number  he  is  trying 
to  estimate,  then  we  have  a fairly  straightforward  explanation  for  the  log 
normal  distributions  of  responses,  for  tlie  increase  in  standard  deviations 
and  errors  witii  size,  and  for  the  logarithmic  distriiiution  of  first  digits 
in  tlie  responses.  Tliete  does  not  appear  to  be  a natural  way  to  test  the 
liypotliesis  ciirectly.  The  direct  question  "how  much  bigger  than  1 million  is 
one  l)illion"  has  too  facile  a response  from  arithmetic. 

The  psyciionumeric  hypothesis  can  be  interpreted  as  asserting  that  the 
individual  "thinks"  in  terms  of  the  log  transform  scale,  and  tfius  needs,  so 
to  speak,  a double  translation  process,  first  expressing  the  response  space 
for  the  question  in  terms  of  the  log  scale,  performing  his  estimate  in  this 
response  space,  and  then  retranslating  the  response  into  "ordinary  numbers." 
Of  course,  if  this  were  done  literally,  the  power  of  computation  available 
to  the  individual  would  have  to  be  somewhat  beyond  the  capabilities  observed 
in  lower  division  mathematics  courses.  The  problem  of  representing  such 
"internal  scales"  has  been  treated  by  Anderson." 

For  our  purposes,  it  is  useful  to  express  the  psychonumeric  liypotheses 
explicitly.  This  can  be  done  by  considering  the  logarithmic  form  of  F.qua- 
tion  (7).  If  we  let  lower  case  letters  stand  for  tiie  log  transform  of  the 
corresponding  upper-case  letters,  i.e.,  r = log  R,  t = log  T,  e = log  Ct  etc., 
tlien  we  iiave 

r = t + b + c (8) 

if  we  assume  t is  normally  distributed,  then  the  density  function  for  r is 

_ (r-t-b)^ 

D(r)  = 1 /Vztto  e (9) 
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where  o is  Llie  standard  deviation  of  the  error  component.  Equation  (9)  asserts 
that  the  distribution  D(R)  of  the  overt  response  of  the  subject  is 

D(R)  = D(r)  (10) 

Equation  (9)  looks  a good  deal  more  intricate  than  it  is;  it  asserts  that 
the  distribution  of  individual  responses,  if  we  could  observe  it,  would  look 
like  the  distribution  in  Figure  13. 

The  distribution  defined  by  Equation  (9)  is  not  the  same  as  the  indi- 
vidual's subjective  probability  distribution  on  the  quantity  being  estimated. 
The  subjective  theory  of  probability  expounded  in  the  previous  sectioii  lias 
the  consequence  that  the  individual  has  a subjective  probability  distribution 
on  any  quantity  (at  least  any  which  he  contemplates).  It  seems  likely  that 
there  is  some  relationship  between  the  response  distribution  and  this  subjec- 
tive probability  distribution,  but  neither  the  theory  of  errors  nor  subjective 
probability  theory  provides  a formal  link.  I f an  experimental  procedure 
could  be  devised  which  identified  the  response  distribution  sharply,  it  would 
doubtless  be  highly  informative  to  make  a comparison  between  the  two  typos 
of  distribution.  One  would  expect  a rough  correlation  between  the  standard 
deviations  of  the  two,  and  it  seems  likely  that  the  means  would  correspond 
closely  to  each  other. 

An  instructive  investigation  arises  from  forming  a hybrid  between  sub- 
jective probability  theory  and  the  theory  of  errors.  In  the  section  on 
calibration,  it  was  pointed  out  that  most  individuals  ere  poorly  calibrated. 
This  is  usually  Interpreted  as  indicating  that  tlie  individual  o^e^stiiMU^s 
his  Informution.  This  interpretation  Is  strengthened  by  testing  the  indi- 
vidual's propensity  to  bet  on  his  estlnvites.  Slovic  reports  that  most  of 
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his  subjects  were  willing  to  bet  on  their  estimates,  even  those  with  extreme 

22 

otiils.  Rut  t lie  situation  may  be  more  complicated.  Suppose  we  assume  that 
th('  iiulividual's  probability  est  invites  art  subject  to  the  same  sort  of  random 
error  as  magnitude  estimates,  so  that  formula  (6)  applies.  It  turns  out 
that  this  assumption  leads  to  calibration  curves  very  much  like  Figure  8. 

The  assumption  that  the  random  error  is  normally  distributed  is  inappro- 
priate for  the  restricted  interval  of  probabilities.  Other  types  of  error 
function  can  be  postulated;  in  the  analysis  below  i investigate  two  possi- 
bilities, a beta  function  and  the  negative  exponential. 

Using  formula  (6)  rather  than  formula  (7)  implies  that  the  mean  of  the 
individual's  probability  judgments  is  the  "true"  probability.  There  are  many 
Investigators  In  the  field  of  probability  estimates  wlio  deny  the  existence 
of  "true"  or  objective  probabilities  for  the  kind  of  question  where  calibra- 
tion is  of  interest.  I'll  bypass  that  sticky  point  by  making  a less  trouble- 
some assumption;  namely,  tlie  assumption  that  if  the  individual  always  reported  >» 

the  mean  of  his  response  distribution,  then  he  would  be  fully  calibrated. 

Although  tills  appears  to  be  a strongly  "favorable"  assumption,  it  will  turn 
out  that  the  subject  will  be  poorly  calibrated  when  the  random  error  is 
i nc luded . 

Case  _1:  I^J:^  distribution.  Consider  an  individual  who  selects  a 

response  R out  of  a distribution  of  the  form 

D(R)  = aR^d  - R)  (11) 

where  T is  determined  by  the  mean,  R.  In  this  case 

T(R)  - ~-i-- 

1 - R 

a is  a normalizing  constant,  which  is  also  a function  of  the  mean 
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a(R)  = (T(R)  + 1)  (T(R)  + 2) 

Equation  (11)  is  the  form  appropriate  for  R > .5;  for  R < .5  the  appropriate 
form  is 

D(R)  = aR(l  - R)^  fl2) 

An  example  of  Equation  (11)  is  given  in  Figure  18  where  R = .35. 
if  we  iissume  that  the  individual  is  posed  a large  set  of  questions  for 
which  the  objective  probabilities  are  uniformly  distributed  between  0 and  1, 
we  can  calculate  the  resulting  calibration  curve.  The  results  are  presented 
in  Figure  19. 

Case  2:  Negative  exponential  distrlbut ion.  If  we  assume  that  the  siiape 

of  the  distribution  is  affected  by  the  degree  of  certainty  of  the  individual 
in  his  estimate,  then  the  distribution  should  become  "flatter"  as  the  degree 
of  uncertainty  increases.  A rather  extreme  case  is  afforded  by  the  negative 
exponent ial 

D(R)  = ae"*’^  (13) 

where  both  a and  b are  functions  of  the  mean  R.  An  example  of  Equation  (13) 

for  the  case  R = .25  is  given  in  Figure  20. 

The  negative  exponential  is  the  maximum  entropy  distribution  where  only 
the  mean  is  known.  Thus  it  is  the  "flattest  possible"  curve  giveii  the  assump- 
tion of  realism  for  the  mean.  The  calibration  curve  resulting  from  tliis 
distribution  is  displayed  in  Figure  21. 

The  resemblance  of  the  two  calibration  curves  to  those  observed  empiri- 
cally is  rather  striking.  As  might  be  expected,  the  maximum  entropy  distri- 
bution leads  to  pooler  calibration  than  the  beta  distribution.  Nevertheless, 
it  should  be  pointed  out  that  the  resulting  calibration  curves  for  either  the 
beta  distribution  or  the  negative  exponential  are  still  "better"  than  the 
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1‘mpiricnlly  observed  curve  Figure  9.  IL  seems  likely  that  bias  is  playing 
a role  In  Figure  9 as  well  as  random  error. 

Figures  19,  21  are  espei lally  Interesting  In  that  they  are  derived  from 
an  assumption  which  In  a sense  is  the  reverse  of  the  usual  interpretation. 
The  assumption  is  that  the  individual  is  highly  uncertain  and  this  leads  to 
variability  in  his  responses.  Rather  than  being  overconfident,  he  is 
(behavlorally)  just  the  reverse. 

There  is  a peculiar  nursery  rhyme  quality  about  this  analysis;  the 
situation  is  much  like  Jack  Horner,  thrusting  his  thumb  at  random  into  the 
response  pie,  pulling  out  a response,  and  then  saying  "What  a good  boy  am 
I — that's  just  right  I" 

The  analysis  presented  here  seems  to  undercut  further  the  notion  of 
calibration  as  a feasible  procedure  for  correcting  an  individual's  proba- 
bility judgments.  For  any  given  judgment,  the  "appropriate"  correction 
would  not  be  the  empirically  established  F(R),  but  rather  the  unknown  mean, 
m,  of  the  individual's  response  distribution. 


4 


107 


CHAPTER  III.  FIGURES  OF  MERIT 


2I. 

It  was  stressed  in  the  Introduction  that  an  essential  aspect  of  the 
resolution  of  disagreement  is  the  formulation  of  figures  of  merit  or  scoring 
rules.  Some  measure  of  performance  is  needed  to  apply  the  Emerson  principle  — 
to  show  that  one  resolution  procedure  generates  a judgment  of  greater  excel- 
lence than  another.  The  present  chapter  is  concerned  with  figures  of  merit 
for  factual  statements.  Value  Judgments  will  be  taken  up  in  Chapter  VI. 

There  is  a bewildering  variety  of  scoring  procedures  to  be  found  in  the 
literature.  There  does  not  appear  to  be  a general  theory  encompassing  all  of 
these  and  their  obvious  extensions.  One  reason  is  that  scores  can  be  hand- 
maidens to  a variety  of  kinds  of  excellence.  One  large  and  not  very  well 
organized  class  of  scores  has  evolved  around  the  function  of  specifying  scien- 
tific excellence — l.e.,  furnishing  criteria  for  the  decision  "statement  x is 
scientifically  acceptable."  Another  large  body  of  procedures  has  grown  up 
around  psychological  measurement  — using  test  scores  to  attach  descriptive 
indices  such  as  I.Q.'s  to  individuals.  .Scores  can  be  used  as  motivating 
devices  as  with  school  grades,  or  National  Football  League  standings.  And 
scores  can  perform  multiple  roles,  such  as  being  constructs  for  model  build- 
ing. The  Gross  National  Product  is  a figure  of  merit  for  the  economy,  but 
it  is  also  a basic  notion  in  macro-economic  theories. 

Another  complexity  is  the  fact  that  scores  themselves  can  be  subjected 
to  evaluation.  A given  scoring  procedure  may  or  may  not  be  a good  measure, 
depending  on  the  role  it  is  Intended  to  play.  A case  in  point  is  the  theory 
of  scoring  as  applied  to  psychological  measurement.  There  is  (from  the  point 
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of  view  of  the  present  effort)  an  interesting  ambiguity  that  pervades  most 
of  this  subject.  The  usual  function  of  a psychological  test  is  to  attach  a 
number  to  an  individual,  denoting  an  aptitude,  a trait,  or  perhaps  an  atti- 
tude. The  function  of  attaching  a figure  of  merit  to  the  responses  is 
secondary.  In  intelligence  testing,  for  example,  the  test  constructor  pre- 
sumably knows  the  answers  to  the  questions.  For  some  tests,  like  the  intel- 
ligence test,  attaching  an  independent  objective  score  to  the  items  is  an 
Intermediate  step  to  the  basic  intent.  For  other  tests,  such  as  personality 
"inventories"  or  attitude  scales,  there  is  no  objectively  correct  answer  to 
the  items.  Even  for  those  tests  where  there  is  an  objectively  correct  answer 
it  is  becoming  clear  that  utilizing  data  for  responses  which  are  not  correct 
may  be  more  diagnostic  than  simply  counting  correct  answers.^ 

Very  generally  speaking,  given  a response  K (to  a factual  question)  and 
given  the  true  response  T,  a score  measures  the  discrepancy  between  R and  T; 

i.e.,  there  is  a function  S(R,T)  which  expresses  the  degree  to  which  R approxi- 
mates T on  some  criterion.  S will  thus  depend  on  the  form  of  R and  T,  as  well 
as  the  role  of  R in  a decision  process.  The  following  list  of  frequently  used 
scores  is  not  intended  to  be  exhaustive;  however,  the  list  is  representative 
of  the  range  of  scoring  methods  that  have  played  a role  in  decision  analysis. 

Types  of  Scores 

1.  Binary  scores 

2.  Distance  scores 

3.  Scaled  distance  scores 

4.  Correlations 

5.  Probability  scores 

6.  Decisional  scores 
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j^liiary  Scoros.  Tlu'  HimpJfsl  klnti  of  binary  scoro  is  a direct  comparison 
bi'tween  R aiul  T,  o.g.,  I (or  "true")  for  tlie  case  R = T,  and  0 (or  "false") 
tor  R f.  Ttiis  is  tlic  scoring  .scheme  used  in  traditional  two-valucd  logic. 

1 is,  of  course,  excellent,  and  0 is  bad.  Most  of  the  rules  of  deductive 
logic  are  concerned  with  transformations  which  preserve  the  score  1. 
Straightforward  extensions  of  this  type  of  score  are  obtained  by  "accept- 
able level  of  approximation"  forms,  e.g.,  S(R,T)  = 1 if  |R  - t]<c,  otherwise 
S(R,T)  = 0,  where  c is  a constant  defining  a region  around  T that  is  "close 
enough. " 

The  true-false  score  is  relatively  unambiguous  when  applied  to  singular 
estimates  like  "The  diameter  of  the  moon  is  2160  miles."  Difficulties  arise 
when  the  score  is  applied  to  general  statements  such  as  "Human  beings  have  a 
life  span  of  less  than  130  years,"  where  no  exhaustive  list  of  cases  can  be 
displayed.  An  implicit  requirement  for  most  scoring  rules  is  finite  applica- 
bility. As  a result,  scoring  rules  are  usually  restricted  to  singular  esti- 
mates, or  at  most  a finite  set  of  singular  estimates.  This  restriction  is 
followed  in  the  present  book.  It  clearly  leaves  unexamined  a significant 
realm  of  estimates  which  guide  decisions,  namely,  judgmental  generalizations. 
Jn  pari  this  gap  is  narrowed  by  considering  correlations  as  types  of  scores, 
and  by  probabilistic  scores  whicli  have  a hazy  connection  wltii  generalization. 
But  neither  confront  head  on  the  issue  of  attaching  figures  of  merit  to 
general  statements. 

Dl^t^imji^e  Scores . A more  general  kind  of  score  can  be  defined  if  R and 
T are  elements  in  a metric  space . A metric  space  is  defined  as  a set  of 
elements  where  the  distance  D(x,y)  between  each  pair  of  elements  x,y  is 
defined,  and  D(x,y)  fulfills  the  conditions: 


DL.  D(x,y)  > 0,  anil  D(x,y)  = 0 if  and  only  if  x = y. 

D2.  D(x,y)  = D(y,x) 

D3.  D(x,y)  + D(y,z)  > D(x,z) 

For  ordinary  magnitudes,  D(x,y)  = ]x  - y[.  For  example,  the  distance 
between  5°  Fahrenheit  and  30°  Fahrenheit  is  25°.  However,  the  notion  of  a 
metric  space  is  very  general  and  applies  to  a wide  variety  of  types  of  quanti- 
ties and  types  of  distance  measures.*  Given  a metric  space,  we  can  define 
S(R,T)  = -D(R,T),  the  negative  sign  to  indicate  that  smaller  distances  are 
more  excellent.  I)(R,T)  is  the  common  definition  of  error. 

For  many  purposes,  tlie  distance  squared  is  a more  convenient  measure. 

For  single  quantities,  the  distance  involves  the  absolute  magnitude,  which  is 
difficult  to  manipulate;  the  squared  distance  is  more  tractable.  For  multi- 
dimensional quantities,  the  distance  squared  has  a number  of  additional  con- 
venient features.  In  n-dimensional  euclidean  space,  the  distance  is  defined 
as 

t)(x,y)  = (Z  (X.  - y.)2 

where  x^  and  y^  arc  the  components  of  x and  y on  dimension  i.  The  square  root 

2 

is  awkward,  but  in  addition,  D(x,y)  decomposes  directly  into  the  sum  of  the 
squared  differences  on  each  coordinate  — a very  useful  property. 

The  distance  squared  is  associated  with  a number  of  indices  which  are 
of  basic  utility  In  statistics,  such  as  the  variance  and  least  squares  approxi- 
mations. A useful  relationship  is  the  following: 

1/n  ^ I)(R^,T)^  = 1/n  ^ D(Rj,R)^  + U(R,T)^  (1) 

= Var(R)  + U(R,T)^  (la) 

* 

An  elementary  treatment,  along  with  many  considerations  relev.'uit  to  the 
present  chapter  and  Chapter  V,  is  presented  in  Kemmeny  and  Snell. ^ 
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This  1 1‘ l.il  ionslt  i p holds  lor  a stM  ot  r<‘.spoMSfs  K.  whoro  f<  is  tlii-  rn  iii,  and 

VarUO  is  t ho  varianoo  ol  the  rosponsos.  The  loi.nula  si  atos  that  t ho  i.  rapo 

sqn.iroil  distaiu'o  ol  a sot  of  rosponsos  to  t h('  true*  is  ju  t iho  variaino  ol 

tho  rosponsos,  plus  the  sqiiaretl  dit.tanoo  ol  i ho  moan  to  i ho  triu-.  In  i he 

latigoape  of  error,  the  average  of  the  squared  or;  ir  of  the  individual 

I ospc'nsos  is  fust  tlii'  variaiue  ol  the  individual  resp<.insc‘s  plus  1 ho  scpiared 

■k 

error  ol  the  nusan. 

1 ed  1)  i s t.'inces . It  is  often  not  tfie  absolute  size  of  an  error  that 
is  relevant,  but  tho  comparative  .size.  This  becomes  especially  important 
wiien  accuracy  on  different  quest  ion.s  is  being  compared,  or  when  an  average 
score  over  many  questions  is  computed.  For  example,  if  the  question  is  "ilow 
many  gallons  of  beer  were  consumed  by  the  American  public  in  1970?"  if  the 
answer  is  off  by  one  million  gallons,  that  is  only  an  error  of  one-tenth  of 

one  percent.  But  if  tlie  question  in,  "How  many  deaths  due  to  accidents 

occurred  in  the  United  States?"  and  tfie  answer  is  off  by  c>ne  million,  that 
is  an  error  of  JOOO  percent.  Clearly,  the  latter  error  Is  "mueli  worse."  To 
comp.ire  accuracy  on  these  two  questions.  It  seems  natural  to  normalize  the 
responses  to  the  size  of  the  true  answer,  i.e.,  to  define  tlie  error  as 
K - t|/T.  This  particular  normalization  works  only  for  ratio  scales,  where 

it 

Actually,  T,  and  the  R.,  can  be  any  real  numbers  as  far  as  (1  ) is  concerned; 
the  formula  doe.s  not  depend  upon  the  fact  that  T is  the  true  answer  to  a 
<|uest  ion  for  which  the  are  responses.  A similar  remark  fields  for  many  of 
the  formulae  In  this  book,  whore  the  notation  is  more  restrictive  than  the 
content  of  tfie  statements. 


in 


there  is  a well-defined  zero. 


For  example,  if  a temperature  is  bein^  esti- 


r 


mated,  and  T = 32°  Faltrenheit,  and  the  estimate  is  A1 

9/32  = However,  if  the  problem  is  transformed  to 

2 

scaled  error  is  5/0  = '»■.  For  other  purposes,  D(R,T) 


rror  is 


°,  the  scaled  e|i 
the  Celsius  scale,  tlie 
/T  may  be  more  appro- 


pr iate. 

More  intricate  forms  of  rescaling  are  often  useful.  The  psychonumeric 

hypothesis,  for  example,  suggests  that  a logarithmic  transformation  of  R and 

r mav  he  more  in  accord  with  perceived  size  than  R and  T unsealed.  The 

lo,:ar  1 thmic  transformation  has  been  employed  In  a number  of  group  judgment 

3 

studies  with  a slightly  different  justification.  Ihus,  if  the  quantity  in 
question  is  measured  by  a ratio  scale  with  a natural  zero,  but  no  upper  bound 
(such  as  length,  height,  age,  etc.,)  then  for  a given  T the  range  of  under- 
estimates is  fixed,  namely,  estimates  between  0 and  T,  while  the  potential 
range  of  overest  invites  is  unlimited,  from  T to  In  actual  practice,  tlie 

range  of  potential  overestimates  is  usually  not  quite  so  grand,  but  it  still 
may  be  much  larger  than  the  range  of  underestimates.  The  distance  measure 
L)(R,T)  = log  R/T  = log  R - log  T evens  out  these  two  ranges.  - < log  R - 

log  T < ‘^  . 

More  generally,  a transformation  G(R),  where  G is  monotonlcal 1 y increas- 
ing in  R,  may  be  employed.  The  figure  of  merit  then  becomes  1) (G (R) ,G (T) ) . 

One  frequently  employed  transformation  in  statistics,  when'  a set  of  esti- 
mates of  till?  same  quantity  is  elicited,  is  the  so-called  normal  score  or 
z-score 

G(R)  = (R  - R)/S|^.  (2) 

The  use  of  tlie  standard  deviation  as  a normalizing  factor  is  widespre.ad 
in  statistics.  In  fact,  the  standard  deviation  itself  has  sometimes  been 
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oraployt-d  as  a figuro  of  merit  — tlie  so-called  stan^ar^  error.  There  is  a 
large  family  of  statistical  figures  of  merit,  associated  with  the  notion  of 
statistical  signif icance.  It  will  not  be  possible  to  deal  with  this  family 
In  the  present  exposition.  Some  of  the  statistical  measures  are  closely 
related  to  probabilistic  figures  of  merit  to  be  discussed  below. 

Correlations . One  statistical  figure  of  merit  closely  related  to  the 
factor  model  approach  to  estimation  is  the  correlation.  In  the  case  of  the 
factor  model,  the  object  of  interest  (for  the  investigator)  is  the  model , i.e 
the  process  by  wliich  estimates  (of  a given  kind  of  quantity)  are  generated, 
not  a single  estimate.  Tims  an  Individual  can  be  asked  to  generate  a set  of 
responses  each  with  a different  true  response  Tj  . The  question  is, 

then,  how  closely  the  set  f matches  the  set  One  widely  employed 

measure  is  the  average  error,  1/n  D(R.,T,).  More  frequently,  the  average 

2 J ^ ^ 

squared  error,  1/n  ^ D(R.,T.)  is  employed  for  reasons  given  above.  It  is 

i 

also  quite  common  to  first  compute  z-scores  for  the  R's  and  T's. 

The  average  squared  error  has  one  drawback.  If  all  the  R's  contain  a 
large  bias,  even  if  the  R's  match  the  T's  well  otherwise,  the  average  squared 
error  will  obscure  the  fact.  A measure  which  overlooks  the  bias  is  the  corre 


lation,  usually  defined  as 
P 


^ (R.  - R)  (T  - T) 

- “ ^ r "^R^ 


(3) 


If  we  set  R'  = (R  - R)/s,,  and  T'  = (T  - T)/s-.,  ( 3 ) becomes 

K 1 


RT 


= 1/n  RjTJ 

j 


(4) 
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As  might  be  expected,  there  is  a close  relationship  between  the  average 
squared  error  and  the  correlation.  This  relationship  is  given  by  the  formula 


1/n  x;  D(R,,T  = Var(R)  + Var(T)  - 2Sj^s.j.Pj^.j.  + (R  - T)^  (5) 

Correlation  has  received  a "bad  press"  ranging  from  the  frequently 

Iterated  statement  that  correlations  do  not  display  causal  relations,  only 

"associations,"  to  recent  contentions  that  correlations  are  inherently  weak 

4 

measures.  This  contention  has  been  pressed  most  strongly  for  muillple  cor- 
relation where  several  variables  are  involved.  Roughly  speaking,  the  idea  is 
that  if  several  variables,  f^  are  used  to  predict  a given  quantity  T,  then  if 
each  of  the  variables  individually  are  positively  correlated  with  T ("mono- 
tonically  related"  is  the  more  common  expression)  a high  correlation  will 
) obtain  for  about  any  "reasonable"  linear  combination  of  the  variables;  thus 

I the  correlation  is  uninformative  about  the  structure  of  the  model 


T = F(f 


The  first  objection  — the  non-causal  import  of  correlation  — is  not 
particularly  troublesome  when  a correlation  is  being  used  as  a score,  rather 
than  as  an  adjunct  to  theory  building.  For  the  general  case  of  non-perceptual 
estimates  there  is  no  presumption  that  an  estimate  is  causally  related  to  the 
phenomena  being  described.  However,  the  fact  that  correlat ion  ignores  bias 
is  somewhat  more  serious.  For  theoretical  Investigations,  demonstrating  that 
a significant  correlation  between  estimates  and  true  answers  exists  does  not 
imply  that  the  responses  will  be  particularly  (lose  to  the  true  answers,  only 
that  they  will  covary  in  a reasonable  way. 

The  second  objection  is  more  to  the  point.  If  a relatively  good  score 
can  be  guaranteed  beforehand  simply  by  the  structure  of  the  estimate  and  the 
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way  the  score  is  computed,  the  score  may  not  be  a uselul  measure  of  the 


I 


excellence  of  the  estimate.  This  issue  will  be  explored  more  thoroughly  in 
t lie  next  ciiapter  on  nominal  iudgments. 

? . Trobabi  1 isL  i_i-  Scares 

Scores  for  probability  estimates  warrant  a section  of  their  own.  The 
(onceptual  problems  Involved  are  much  deeper  than  those  associated  with  magni- 
tude estimates.  There  has  been  something  like  a nuijor  breakthrough  in  the 
past  ilecade  or  so  in  the  formal  theory  of  probabilistic  scores,  but  the 
significance  of  this  theory  for  decision  analysis  is  still  under  lively 
exploration. 

In  the  spirit  of  distance  scores,  1 f an  individual  estimates  that  the 
probability  of  an  event  E is  R,  and  the  true  probability  is  P,  then 
n(R,P)  = jR  - p|  or  D(R,P)  = (R  - P)  might  appear  to  be  reasonable  figures 
of  merit  for  R.  The  only  difficulty  with  this  measure  is  that  in  practice 
the  true  probability  P is  usually  unknown.  However,  there  i.s  a way  of 
obtaining  a closely  associated  measure,  without  knowing  P,  by  using  the 
notion  of  expectation.  This  can  be  Illustrated  by  the  expectation  of  the 
characteristic  function  for  an  event.  The  characteristic  function  C(E)  for 
tile  event  E is  equal  to  1 if  E occurs,  and  equal  to  0 If  it  does  not  occur. 

T'lie  association  of  the  characteristic  function  and  the  truth  value  of  the 
statement  "E  occurs"  is  clearly  quite  close.  Now  the  expectation  of  the 
characteristic  function  is  just  equal  to  the  probability  of  E.  In  symbols, 
Ex(C(E))  = P(E).  Tills  statement  can  be  nvide  wltfiout  knowing  P(E). 

Oeneral izing  this  notion,  we  can  look  for  a score  which  has  the  property 
tliat  the  expectation  of  the  score  for  R minus  the  expectation  of  the  score 
for  P is  )ust  the  distance  squared  between  R and  P.  In  symbols,  we  ran  try 
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tt)  design  a score  S(R)  such  that  Kx(S(R))  = (R  - P)  . This  goal  is,  in  fact 
achievable.  The  quadratic  scori-  described  below  has  this  property.  However, 
it  turns  out  that  a somewhat  weaker  reoulrement  leads  to  a more  fruitful 
t heory . 


No  matter  how  we  want  to  measure  the  discrepancy  between  R and  P,  we 
clearly  would  like  the  measure  to  be  a minimum  when  R equals  P.  In  symbols, 
we  would  like  Ex(S(R))  - F.x(S(P))  to  be  a minimum  at  R = P.  In  somewhat 
more  expressive  notation,  we  would  like  a score  function  S(R,t),  which  assigns 
a score  given  that  E occurs,  where  R is  the  estimated  probability  of  E.  For 
estimates  of  the  probability  distribution  R = (Rj,...,R^)  for  a partition 
E • (Ej^,...,E^)  of  U,  we  can  write  the  score  as  S(R,j),  the  score  given  that 
event  E^  occurs,  and  R is  the  estimate.  Our  discrepancy  condition  t lu-n 
becomes 

E PjSCR.J)  < Y.  P,S(P,j)  (b) 

j j ^ 


This  formulation  allows  for  the  possibility  that 


event  E^  may  depend  on  the  entire  distribution  R, 
estimate  R^  of  the  probability  for  the  event  E^ . 


the  score  associated 
rather  than  just  on 


with 

the 


(6)  in  one  variant  or  another,  and  in  a plethora  of  interpretations, 
has  formed  the  ba.sis  for  a large  proportion  of  r»;cent  investigations  in 
probabilistic  scores.^  The  condition,  of  course,  does  luU  guarantee  that 
there  is  a function  S(R,j)  that  fulfills  (6  );  howevtr,  as  it  turn:!  out. 

there  is  a large  tamily  of  such  functions.  A number  of  examples  are  listed 

later  on.  The  remarkable  thing  about  (6  ) is  that  despite  the  simplicity  of 

its  origin,  it  imposes  a numbei  of  properties  on  scores  wlilch  are  desitabli- 

in  light  of  their  potential  role  in  decisions. 
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The  family  of  scoring  rules  characterized  by  ( 6 ) has  been  called  proper 
scores,  admlssibU'  scores,  or  reproducing  scores,  the  latter  from  the  fact 
that  tor  a probability  distribution  P,  the  maximum  expectation  is  obtained 
when  R = I’. 

The  desirable  properties  of  scores  fulfilling  (6)  Include  the  following; 
( i)  The  score  is  operational,  that  is,  it  can  be  assigned  on  the  basis  of  a 
single  instance.  If  a weather  forecaster  says  "The  probability  of  rain 
tomorrow  is  R"  and  it  doesn't  rain,  an  index  can  be  attached  to  his  forecast 
without  waiting  for  a thousand  forecasts.  How  useful  that  single  score  may 
be  i.s  another  matter.  (b)  The  score  rewards  the  forecast  for  accuracy;  i.e., 
the  expectation  increases  as  the  report  R gets  closer  to  the  actual  proba- 
bility. (c)  With  a small  additional  assumption,  the  score  rewards  the  fore- 
cast for  def initeness;  i.e.,  the  expected  score  increases  as  R tends  toward 
[irobability  one  for  some  alternative,  and  toward  zero  on  the  others.  (d)  The 
score  rewards  a forecaster  for  honesty.  If  the  forecaster  believes  Q and 
asserts  R,  then  Ills  subjective  expectation  Is  a maximum  when  Q = R.  This 
last  property  has  been  used  as  a basis  for  Imposing  ( 6 ) by  many  investigators 
who  are  dubious  of  "objective"  probabilities.  (e)  The  score  rewards  the  estl- 
iiuitor  for  increasing  his  Information  concerning  the  events  before  formulating 
his  report;  i.e.,  his  expected  score  is  greater  If  based  on  more  Information, 
(f)  S(R,))  can  be  employed  as  a figure  of  merit  for  general  statements  of 
the  form  "All  A's  are  B's"  if  this  is  translated  as  P(B/A)  = 1. 

The  last  property  suggests  a simple  resolution  to  a long  standing  contro- 
v(?rsy  concerning  the  usefulness  of  a probability  logic.  Although  a number  of 
attempts  have  been  made  to  formulate  the  probabilistic  analogue  of  traditional 
two-valued  logic,  none  of  these  have  caught  the  Imagination  either  of 
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logicians  or  of  decision  analysts.  The  reason  would  appear  to  be  that  these 


attempts  have  been  based  on  the  assumption  that  the  appropriate  analogue  for 
the  two-valued  truth  values  true  and  false  in  the  case  of  probability  state- 
ments is  just  the  probability.  Conceiving  of  truth  values  as  scores,  the 
appropriate  analogue  in  the  case  of  probability  statements  would  be  the  score 
as  defined  by  (6  )•  Some  additional  comments  on  this  possibility  will  be 
made  in  the  next  section. 

Properties  (b)  and  (c)  are  established  in  the  following  results.  Wr 
first  note  that  the  expectation  "averages  out"  the  individual  alternative 
events,  thus  we  can  abbreviate  P S(R,j)  by  G(P,R),  and  P.S(P,j)  by  il(P). 

j i ' 

With  this  notation,  (6  ) becomes  G(P,R)  < H(P).  A fundamental  property  of 
scoring  functions  defined  by  (6  ) is  that  H(P)  is  convex;  that  is, 

H(aP  + (l-a)P')  < aH(P)  + (l-a)H(P')  where  0<  a<l.  In  words,  H is  convex, 
if  the  average  of  H at  two  different  points  is  greater  than  the  value  oi  11 
for  the  average  of  the  two  points.  A convex  function  is  one  such  that  a line 
(or  hyperplane)  tangent  to  the  function  at  some  point  always  remains  below  | 

the  function. 

Tdaeorjem  _1.  H(P)  is  convex 

I^oc^:  Let  P"  = aP  + (l-a)P' , whence 

H(P")  = E (iP,  + (l-a)P;)S(P",i) 

j ,1  1 

= a ^ P.S(S(P",j)  + (1-a)  E P:s(P",j) 
j i 

=■  aG(P,P")  + (l-a)G(P'  ,P") 

From  ( 6)  G(P,P")  < H(P)  and  G(P',P")<  H(P'),  whence 
H(P")  < all(P)  + (l-a)H(P') 
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Theorem  2.  S(K,j)  Is  a inaxlimim  when  R = 1. 

■ J 

Prm^  ; If  K.  = I,  ll(K)  = S(K,j)>  G(R,R')  = S(R',j) 

01.  R i.s  more  accurate  than  R’  if  R = aR'  + (l-a)P,  where  0<a<  1. 

This  Is  a somewhat  restricted  definition  of  more  accurate.  In  a 
sense,  each  specific  score  rule  defines  its  own  special  brand  of 
accuracy  (closeness  to  P) . However,  there  is  no  question  but  that 
if  R is  on  a ray  between  R'  and  P,  R is  closer  to  P. 

Theorem  3-  If  R is  more  accurate  than  R',  then  G(P ,R) > G(P , R ' ) . 
f^qo£:  Since  R = aR'  + (l-a)P,  we  can  write 

P = (R  - aR')/(l-a),  Thus  G(P,R)  = 
l/(l-a) (G(R,R)  -aG(R',R)).  Similarly, 

G(P,R’)  = l/(l-a)(G(R,R')  - aG(R',R')). 

Since  C(R,R) > G(R,R’)  and  G(R ' ,R ’ ) > G(R ' , R) , 
the  result  follows. 

D2.  A score  function  S(R,j)  will  be  called  normal  if,  when  R is  an 
equipartlon  (all  R^  equal),  S(R,j)  = S(R,k)  for  all  j and  k. 
Theonem  If  G is  a normal  scoring  system,  H(P)  is  a minimum  at  the 

equipartlon,  P^  = 1/m  for  all  j. 

Proof:  Denote  the  equipartlon  by  P.  From  02, 

H(P)  = E l/mS(P,j)  - S(P,j).  G(R,P)  - J^R  S(P,J)  = 

J j ^ 

S(P,j)  = H(P)  <G(R,R)  = H(R). 

D3.  R is  more  definite  than  R'  if  R'  « aR  + (l-a)R,  0<a<l. 

If  R is  farther  away  from  a uitiform  distribution  than  R'  in  the 
sense  that  R'  lies  on  a ray  between  R and  R,  clearly  some  of  the 
Rj  are  greater  than  R 


whereas  for  those  Rj  < 


1/n,  R^ 


< K- 
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Theorem  5.  If  0 Is  normal  and  R Is  more  definite  than  R' , H(R)>H(R'). 
Proof : H(R')<aH(R)  + (l-a)H(R),  and  since  H(R)  is  a minimum, 

H(R') < H(R) . 

Theorems  3-5  can  be  tightened  up  somewhat  if  further  restrictions  are 
placed  on  S(R,j),  for  example,  if  it  is  assumed  that  S(R,i),  is  symmetrical. 
However,  they  are  sufficient  to  give  content  to  the  assertion  that  any  (normal) 
proper  score  rewards  the  estimator  for  definiteness  and  for  accuracy. 

Consider  a set  of  R's  such  that,  except  for  R^  for  some  k,  the  remain- 
ing R^'s  are  in  proportion;  that  is,  R^  = a^(l-R^),  1 4 53  0<a.  <1. 

i=fk 

We  can  call  such  a set  determined  by  k. 


Theorem  If  ,¥{  is  determined  by  k,  then  for  R in  ■'/* 
S(R,k)  is  monotonic  in  R^^. 

Proof : Let  R,  and  R'  be  members  of. where  R^  > R^. 


From  ( 6 ) we  have 

53  R S(R’,j)  < X)R4S(R,J) 

j ^ j ^ 


Z R!S(R,j)  < Z R;S(R',j) 

j 3 j 3 

Whence  Z (^4  “ Rl)(S(R',  j)  - S(R,j))  < 0,  that  is 

3 ^ ^ 

(R^  - R^)(S(R',k)  - S(R,k))  + (R^  - Rj^)  52ai(S(R',i)  - S(R,1))  < 0 

i 

By  assumption,  Rj^  - R|^  > 0,  thus  if  S(R'  ,k)  - .S(R,k)  t 0, 
Z^j(S(R',1)  - S(R,k))  2 0.  But  this  implies 

A 1 


Z R,^(R' . 1)  ^ Z *^i^**^»  1)  • ‘■'Jntrary  to  (6). 

, 3 , J 


alternative  score  rule,  S(R,j)  Is  monotonic  In  H.. 
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oof ; Immodlate  since  any  two-alternative  R is  determined  by  Rj  . 
Theorem  b establishes  a useful  monctonlcy  property  for  any  proper  scoring 
1 iinct  ion  S (R,  J ) . 

With  regard  to  property  (e) , suppose  an  estimator  is  faced  with  the  option 
c ither  of  making  his  judgment  based  on  what  he  knows,  or  of  obtaining  further 
information.  Can  anything  be  said  about  his  expected  score  taking  one  or  the 
other  option?  You  would  probably  expect  that  either  the  score  obtained  with 
additional  information  is  better  than  the  score  without,  or  something  is 

seriously  wrong  with  the  score  rule.  As  it  turns  out,  condition  (6)  is  suf- 

★ 


tlcient  to  show  that  any  proper  score  rule  has  this  desirable  property.  In 
order  to  demonstrate  this,  we  need  some  additional  notation.  Let  R^  be  as 
usual  the  estimated  probability  for  event  . Assume  that  "obtaining  addi- 
tional information"  can  be  described  by  another  partition  U,  to  be 

Interpreted  as  the  alternative  states  of  the  world  that  could  be  identified 
by  a particular  Information  search.  Given  that  the  search  specifies  1^  as  the 
state  of  the  world,  the  Individual  can  then  estimate  the  probability  (R^|lj)  = 

R, . for  the  event  E, . 
i,l  i 

Evaluating  the  two  options  before  the  fact  — i.e.,  before  the  decision  to 
seek  further  information  or  not  — there  exists  a certain  probability  that  the 
search  will  identify  each  of  the  1^  as  the  state  of  the  world.  We  can  designate 
these  probabilities  as  P(Ij).  They  are  not  represented  as  estimates  since  the 
general  result  will  be  Independent  of  these  probabilities.  However,  if  the 
decision  were  being  made  in  a practical  context,  it  would  be  necessary  to 
estimate  these  probabilities  In  order  to  determine  whether  the  additional 


★ 

The  analysis  that 


follows  Is  a generalization  of  a result  due  to  Raiffa.' 
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information  justified  whatever  costs  were  Involved.  With  the  additional  nota- 
tion we  can  assert 

Theorem  7.  R.  S (R,  i)  ^ P ( 1 . ) • S (R  . , U 

i j J i ^ 

Proof : From  the  rule  of  elimination,  F(l),  Chap.  IJ, 

R^  = Substituting  for  R^  in  the  left  side 

of  the  theorem  we  get 

E E pUi)Ri,s(R.i)  = E p(^>E  f^iiSCR.!) 
i j j ' i J 

By  (B)  ^R..S(R,i)<  ^ R S(R.  , 1)  for  every  j . 

1 iJ  J ij  J 

Whence  the  theorem  follows. 

This  result  is  quite  general.  It  applies  to  any  scoring  rule  that  ful- 
fills ( 6 ).  As  we  have  seen,  the  result  is  independent  of  the  probabilities 
P(Ij).  One  assumption  is  concealed  in  the  notation,  namely  that  the  e.stlmates 

R.  and  R "act  like  prcbabilltles,"  and  in  particular  P(I.)  and  R..  combine 
^ J 1) 

in  the  proper  way. 

This  theorem  is  the  analogue  for  scoring  rules  of  the  refinement  theorem 
8 

of  Marschak.  The  refinement  theorem  states  tliat  the  only  condition  under 
which  an  information  pattern  will  lead  to  an  improved  payoff  over  another, 
independently  of  the  payoff  function  or  the  specific  probability  assignment,  is 
if  the  first  is  a refinement  of  the  second,  i.e.,  if  the  first  is  a partition 
of  the  second.  In  Theorem  7 we  have  assumed  that  the  partition  I,  is  a refine- 
ment of  the  (implicit)  information  pattern  available  to  the  Individual. 

Theorem  7 has  a number  of  applications.  It  substantiates  the  intuitive 
attitude  that  additional  ini orniat Ion  is  always  a good  thing.  This  will  he 
Invest  igated  further  In  Chap.  V.  It  Is  a general  schema  that  cjin  he  explolti'd 
to  show  that  two  heads  are  better  than  one,  as  is  done  in  Cliap.  V.  Finally,  it 
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i 1 liimlnnt  i-s  a puzzle  that  oan  plague  Liie  evaJuation  of  a specific  piece  of 
informal  lf)i).  The  theorem  IioLds  before  the  fact,  l.c.,  before  a specific  is 
established  as  t lie  sl.ite  of  the  world.  After  the  fact,  the  expected  score 
from  a given  1^  may  be  smaller  than  the  expecte<l  score  without  . It  is  only 
cn  the  average  that  the  expectation  is  Increased.  Although  the  present  con- 
ceptual apparatus  is  not  quite  sufficient  to  express  this  notion  completely, 
it  is  germane  to  point  out  that  any  specific  piece  of  information  may  decrease 
the  accuracy  of  an  estimate.  We  can  predict  with  complete  confidence  that  in 
the  normal  course  of  events,  once  In  a while  a solid,  relevant  piece  of 
information  will  decrease  the  accuracy  of  an  estimate.  This  point  flies  in 
the  face  of  standard  lore  which  holds  that  increased  information  always 
improves  estimates. 

A list  of  the  more  frequently  employed  proper  scoring  procedures  would 
inc lude : 

1.  "Scientific"*:  S(R,j)  = 1 if  is  the  maximum  of  the  {R^}, 

otherwise  zero. 

This  simple  scoring  scheme  can  be  interpreted  as  the  justification  for 
the  ordinary  scoring  of  objective  tests,  l.e.,  counting  1 for  each  correct 
answer  and  0 for  each  Incorrect  answer.  Presumably,  the  testee  checks  the 
answer  that  he  thinks  is  most  likely  to  be  correct. 

2.  Brl^r  S^^o^e.  Defined  only  for  a two-alternative  estimate. 

S(R,j)  - (1-Rj)^.  H(R)  = R(l-R). 

This  score,  devised  by  Brier, has  been  extensively  used  by  the 
U.S.  Weather  Bureau  to  evaluate  the  probabilistic  estimates  of  weather  fore- 
casters. I t is  slightly  anomalous  in  that  a lower  score  indicates  better  per- 
f ormance. 

* 8 
Thi-  namt'  is  suggested  by  Marshak. 


125 


3.  Quadratic  Score.  S(R,))  = 2R  “52  R? 

j ^ k 

Brown  reports  that  the  quadratic  score  is  the  only  one  for  which  the  dif- 
ference between  the  expected  score  of  a "perfect"  forecaster  — i.e.,  one  that 
announces  P — and  one  that  announces  R Is  a function  solely  of  P-R.^^  Note  that 

H(P)  - G(P,R)  = ^ (P.  - R.)^.  A complete  graph  of  the  quadratic  score  for 
j J J 

two  alternatives  is  presented  in  Fig.  29,  Chap.  IV. 

4.  Spherical  Score.  S(R,i)  = R. 

J 1 ^ 

The  spherical  score  rule  is  notable  for  the  fact  that  it  is  not  concave. 
Fig.  22  is  a plot  of  the  spherical  score  for  the  two  alternative  case. 

H(R)  = S R^. 
i 

5.  Logarthmlc  Score.  S(R,j)  = log  R^ 

The  logarithmic  score  rule  has  a number  of  properties  which  set  it  apart 
from  the  others.  (a)  It  is  the  only  rule  which  depends  solely  on  the  probabil- 
ity reported  for  the  event  that  occurs.  (b)  It  is  the  only  rule  which  is  addi- 
tive over  successive  estimates.  (c)  It  is  a close  analogue  of  the  Shannon 
entropy.  Note  that  H(R)  = R . log  R..  (d)  It  is  the  only  score  rule  which 

j ^ ’ 

is  Invariant  over  logically  equivalent  estimates.  These  properties  will  be 
explored  more  thoroughly  in  the  next  section. 

The  above  list  ot  proper  score  rules  is  a thin  sampling  of  the  range  of 
scoring  methods  that  can  be  devised  fulfilling  (b).  Further  examples  will  be 
discussed  in  Section  4.  The  fact  that  none  of  them  has  demonstrated  an  over- 
whelming superiority  can  be  Interpreted  in  either  of  two  ways:  (1)  the  field 

is  still  Imnuiture.  (2)  There  are  many  different  roles  that  a scoring  proced- 
ure can  play  in  decision  analysis  and  no  one  of  these  dominates  the  others. 

1 am  inclined  to  follow  the  second  interpretation. 
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The  preceding  section  approached  probabilistic  figures  of  merit  from  the 
standpoint  of  reproducing  scores  — those  which  liave  a maximum  expectation  when 
tlie  estimate  coincides  with  the  objective  probability.  Althougli  satisfying  in 
many  ways,  the  approach  has  the  drawback  that  it  has  a restricted  range  of 
application,  namely  to  estimates  of  the  probability  distribution  on  a parti- 
tion of  U.  Straightforward  extensions  to  continuous  distributions  exist,  but 
these  also  have  an  analogous  limitation.  In  many  practical  situations  esti- 
mates are  required  for  other  logical  forms,  e.g.,  relative  probabi 1 i t ies,  dis- 
junctive or  conjunctive  combinations,  and  the  like.  In  m.iny  cases,  initial 
estimates  are  transformed  in  various  ways  before  they  enter  into  .a  final 
decision. 

Another  way  of  making  the  same  point  is  that  many  logically  equivalent 
forms  exist  for  a given  probabilistic  estimate.  There  is  no  simple  w.ty  to 
accompany  these  transformations  with  corresponding  transformations  on  scores. 
In  addition,  m.any  of  the  reproducing  scores  give  quite  different  values  when 
applied  to  logically  equivalent  estimates. 

We  could  add  the  condition  that  a reprodui:ing  score  should  be  inv.iri.int 
under  logically  equivalent  transformations  of  the  estlnute  and  dett-rmine  I lu‘ 
scores  which  fulfill  this  limitation.  However,  it  turns  out  lli.it  this  condi- 
tion is  extremely  strong,  so  much  so  that  it  almost  determines  tiu-  form  of  t ho 
score  by  itself.  For  this  reason,  it  is  instructive  to  begin  with  a im>re 
general  kind  of  estimate  and  examine  the  consequences  of  t Ite  equivalence 
condit ion. 

We  first  generalize  the  notion  of  an  estimate  from  the  specification  ot  a 
probability  distribution  on  a partition  to  a probability  tree.  In  practice, 
few  estimates  are  solitary;  they  generally  occur  in  a sequence.  Marketing 


estimates  for  a business  enterprise  arc*  iterated  at  more  or  less  regular  inter- 
vals. I’rofesslonal  weather  men  issue  a steady  stream  of  forecasts  of  tomorrow's 
weather.  Development  managers  periodically  revise  estimates  of  likely  comple- 
t ion  dates  for  projects,  and  the  like. 

This  tvpe  of  iterated  estimate  can  be  modelled  by  a probability  tree. 
Starting  at  some  initial  point,  tlie  set  of  (near  term)  potential  events  can  be 
displayed  as  branches  with  corresponding  probabilities.  Events  that  might 
issue  frerm  eacti  ol  ttie  initial  branches  can  be  repr  sented  by  further  branch- 
ing, and  so  on.  An  elementary  example  lor  weather  forecasts  is  given  in  Fig. 

13.  Two  possible  states  for  tomorrow,  rain  and  no  rain,  expand  into  four 
possible  states  for  day  after  tomorrow,  rain  following  rain,  no  rain  follow- 
ing rain,  etc.  The  probabilities  of  given  weather  states  day  after  tomorrow 
will  depend  on  tomorrow's  weather.  The  probability  of  rain  day  after  tomorrow 
(at  least  in  Souttiern  California)  is  higher  if  it  rains  tomorrow  than  if  it 
doesn ' t . 

The  branching  structure  contains  tlie  notion  of  relative  probability.  It 
also  contains  the  notion  of  conjunctive  and  disjunctive  events.  The  event 
"rain  day  after  tomorrow  following  rain  tomorrow"  is  a conjunctive  event, 
ffie  event  "rain  tomorrow"  is  a ciisjunctive  event  in  the  context  of  the 
tree;  It  is  et)ulvalent  to  the  event  "rain  tomorrow  and  rain  day  after  tomorrow, 
or  rain  tomorrow  and  no  rain  day  after  tomorrow." 

Tfie  elements  of  a probability  tree  estimate  are  a set  K = {o  ,x , y , z ,w , . . . } 
of  ncides  (events),  o is  the  origin  (base)  of  the  tree.  Defined  on  K is  a rela- 
tion xl.y,  meaning  x is  the  immediate  predecessor  of  y.  o is  the  only  node  with 
no  Immediate  predcjcessor . i'hc?  .mcestral  relation  xl,*y  is  defined  as:  there  is 

a sequence  x,,...,x  of  nodes,  x = x,  and  y = x , and  x,Lx,,, 

A n i n 1 1+i 


for  all  i < n. 


TOMORROW 


I 


Figure  ?.3.  Meteorological  Probability  Tree 
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I (.x)  designates  t lu>  immediate  predecessor  of  x.  B(x)  denotes  the  set  of  nodes 

u i t ii  the  same  immediate  i)redee<‘ssor  as  x.  l,*(x)  denotes  tfie  set  of  y sucfi  that 

yl.*x  plus  X,  l.e.,  l-*(x)  Is  the  set  of  "ancestors"  ol  x includfnR  x.  Defined 

I’n  the  hraiuhes  K(y)  lollowiiiR  a nfven  node  x i ; a probability  distribution, 

wheri'  the  probability  ot  a given  branch,  y will  be  designated  by  l’(y),  and  the 

d i St r i but  ion  itself  will  be  designated  by  P(y).  Thus,  if  z is  a member  of 

b(y).  I’iy)  = Tills  structure  is  illustrated  in  Fig.  24.  Finally, 

FMx)  = n F(y).  P*(x)  is  the  product  of  the  probabilities  of  all  the  nodes 

yl,*  (x) 

on  the  path  leading  from  o to  x.  P(o)  = 1. 

iVo  probability  trees  K and  K'  are  defined  as  being  equivalent  if  there 
is  a 1-1  correspondence  between  the  endpoints  of  the  two  trees,  and  if  x,  x' 
are  corresponding  endpoints,  then  P*(x)  = P*(x').  The  normal  £oi^  of  a tree 
K is  the  tree  K'  which  consists  of  a single  stage,  with  as  many  branches  as 
there  are  endpoints  in  K,  and  where  for  every  endpoint  x in  K there  is  a cor- 
responding branch  x'  in  K'  with  P(x')  = P*(x).  It  is  an  immediate  consequence 
I'f  this  definition  that  two  trees  and  K2  are  equivalent  if  and  only  if  their 

I I 

corresponding  normal  forms  and  are  equivalent. 

To  define  a probabilistic  score  for  an  estimate  K,  we  assume  there  is  a 
function  S(x)  defined  on  each  node  of  the  tree.  S(o)  = 0.  In  non-technical 
terms,  S is  a reward,  paid  upon  the  occurrance  of  x.  Thus,  S could  be  a score 
assigned  to  the  weather  forecaster  after  the  verification  of  each  forecast, 
or  it  could  be  a fee  paid  to  a marketing  consultant  after  the  close  of  each 
forecast  period. 

The  expecteil  score  hS(K)  of  the  estimate  K is  defined  as 

KS(K)  - X;  P*(x)S(x)  (7) 

X 
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Four  conditions  complete  the  theory: 

Cl.  .S(x)  i.s  <1  function  of  P(x). 

C2.  S(x)  is  normal,  i.e.,  if  y is  a member  of  B(x)  and  P(y)  is  a 
unilorm  distribution,  then  S(y)  = S(x). 

C3.  If  K is  equivalent  to  K' , then  ES(K)  = ES(K'). 

C4.  S(x)  is  continuous  in  P(x). 

Cl  imposes  the  condition  that  the  score  is  determined  locally,  depending 
only  on  the  probability  distribution  on  the  fellow  branches  of  a given  branch. 

C2  is  the  analogue  of  D2  for  reproducing  score.s.  The  essential  condition  is 
C3.  If  two  estimates  are  equivalent,  they  will  generate  equal  expected  scores. 

Theorem  8.  The  only  function  S(x)  fulfilling  C1-C3  is 

S(x)  = c log  P(x) . 

Proc^;  If  P(x)  is  a uniform  distribution,  then  from  Cl  and  C2,  we 

can  write  S(x)  = S(l/n)  where  n is  the  number  of  alternatives 
in  B(x).  Consider  a two-stage  tree,  K,  where  there  are  n 
branches  in  the  first  stage,  and  m branches  at  each  second 
stage.  Designate  the  nodes  of  the  first  stage  as  x^^  and  the 
endpoints  of  the  second  stage  as  . Assume  that  P(xj^)  is  a 
uniform  distribution.  Thus  S(Xj^)  * S(l/n)  and  S(Xj^^)  = S(l/m). 


The  normal  form  of  K has  n x m endpoints,  x , with  a uniform 
distribution  •‘’(x^j)  » .^(l/mn).  C3  requires  that 
X)  P(Xj^)S(x^)  + n P(Xj^j  )S(x^j  ) = P(x|j)S(x^j) . Thus 

n(l/nS(l/n))  + n(l/n(m(l/m  S(l/ro))))  =•  nm(l/nm  S(l/nm)). 
Whence  S(l/n)  + S(l/m)  = S(l/nm).  The  only  continuous  func- 
tion with  this  property  Is  c log  P(x).*  This  proves  the 
theorem  for  uniform  distributions. 


This  fact  is  "well-known."  For  the  curious,  a proof  is  given  in  Appendix  I. 
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To  prove  the  result  for  non-uniform  distributions,  consider  a tree 

K consisting  of  two  stages;  where  the  probability  distribution  on  the 

branches  of  the  first  stage  is  not  uniform.  (Cf.  Fig.  25).  As.siime 

that  P(i)  is  of  the  form  a^/  ^2  with  a^  an  integer.  At  each  end- 

i 

point  i of  the  first  stage  there  are  a^  branches.  Assume  that  P(ij)  is 

uniform,  i.e.,  P(lj)  = 1/a..  There  are  thus  ^ a endpoints  to  the  two- 

^ 1 ^ 

stage  tree,  and  P*(ij)  = (a./J^a  ) 1/a.  = 1/X^a  . The  normal  form  K'  of 

ill  1 ^1 

K is  thus  a single  stage  with  endpoints  and  with  a uniform  prob- 

ability distribution.  Thus,  from  our  previous  result,  ES(K')  = log(l/]^a.). 

i 

ES(K)  = Z (a./Za.)S(P(j),j)  +X)(a./X)a.)  log(l/a.) 

i J i j -■  i J 

Equating  ES(K)  and  ES(K')  we  obtain 

/J^a,  S(P(j),  j)  =5jCa  /Ila  ) log  (a  /^  a ) 
j J 1 1 J i ^ J i ^ 

Invoking  continuity,  we  arrive  at 

i;p,s(p,j)  -Zp,  log  p, 

j j ^ 

which  was  to  be  proved. 

The  two  basic  assumptions  leading  to  this  result  are  fl,  the  score  for 
a given  node  is  a function  of  the  probability  distribution  on  the  fellow 
braches  of  the  node,  and  (7),  the  assumption  i>f  iondltlon.il  additivity. 

That  these  two  quite  general  conditions  could  spei 1 t v the  lorm  of  the  score 
precisely  without  invoking  any  ordering  conditions  is  qulti  surprising. 

Of  course,  in  order  to  use  the  score  for  a figure  of  merit,  the  constant  c 
must  be  assigned,  and  depending  on  the  sign  of  < , the  scoi e can  increase 
with  increasing  probabilities,  or  decrease.  However,  the  question  of  sign 
appears  to  be  secondary.  The  Brier  score  is  one  in  which  a small  score  is 
desirable,  but  that  creates  no  great  confusion  in  interpretation. 
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Fiflure  25.  Probability  Tree  with  Non  uniform  Probabilities 
in  First  Stage 
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The  logarithmic  score  would  thus,  at  first  glance,  appear  to  be  the 
only  reasonable  candidate  for  the  analogue  of  truth  values  for  probabilistic 
statements.  It  has  one  nice  formal  property  which  can  be  stated  as 
S(E.F)  = S(E)  + S(F)  if  E and  F are  independent 

S(E.F)  =■  S(E)  + S(f|e)  = S(F)  + S(e1f)  otherwise  (8) 

Unfortunately,  there  is  no  similar  neat  expression  for  S (E  v F)  or  for  S(E) . 

They  can,  of  course,  be  computed  from  the  probabilities  of  the  components, 

but  not  in  a simple  functional  form.  The  expression  S(F.]F)  Is  of  some  Intei- 

est.  There  is  no  two-valued  log.ical  f>inction  corresponding  to  (e|f).  8(k|f) 

= log  P(E|f)  = log  P(E.F)  - log  P(F). 

The  logarithmic  score  has  another  property  tliat  is  sometimes  considered 

a drawback,  namely  if  the  estimator  announces  R.  = 1 for  some  event  E.,  and  E. 

J 1 I 

does  not  occur,  then  his  score  is  negatively  infinite.  If  one  wanted  to  be 
moralistic  about  the  matter,  one  could  say  that  it  served  the^  individual 
right  — no  one  can  assert  a statement  about  the  world  wltli  absolute  certainty, 
and  that  is  just  wfiat  the  logarithmic  score  recommends.  However,  in  practice, 
it  is  somewfiat  of  a bore  to  qualify  highly  certain  statements  with  some  such 
hedge  as  P(E)  » 1 - e wiiere  C Is  some  small  number.  In  m.my  appl  ic.it  Ions 
of  the  logarithmic  scoring  rule,  the  investigator  does  just  this  for  the 
estimator;  that  is,  the  estimated  probabilities  are  truncated  at  some  point 
close  to  1 and  close  to  zero.  There  is  something  slightly  unsatisfactory 
about  this  tactic,  since  the  computed  score  m.iy  he  highly  sen.sltlve  to  the 
truncation  point. 

The  analogy  of  Theorem  8 with  Sh.innon'8  theorem  that  the  entropy 

- K p log  p.  is  the  only  function  that  is  continuous  and  conditlon- 

i ‘ ^ 

ally  additive  for  probabllltlst ic  message  sources,  is  quite  close.  The  basic 


illfference  Is  the  starting  point.  An  Identical  theorem  could  be  proved  for 
Information,  If,  for  example  one  started  by  defining  a notion  of  amount  of 
information  in  a message  (rather  than  the  uncertainty  of  a source)  and  assumed 
that  the  expected  information  from  a source  had  the  form  (7). 

Despite  all  the  advantages  that  the  logaritlunic  score  has  going  for  it, 
it  lias  not  received  the  overwhelming  endorsement  of  the  community  Interested 
in  evaluating  estinuites.  The  advantages  do  not  appear  to  add  up  to  any  dramatic 
practical  consequences.  Some  relevant  substantive  considerations  will  be 
raised  in  the  following  chapter. 

4.  Decisional  Scores 

The  scores  that  h.ave  been  discussed  so  far  could  be  called  informational ; 
they  concern  the  degree  to  which  a response  is  correct.  In  the  decision  con- 
text, there  is  a more  natural  criterion,  namely,  to  what  extent  does  a response 
Improve  the  outcome  of  the  decision?  To  deal  with  this  question  in  full 
generality  it  would  be  necessary  to  first  develop  the  theory  of  utility  For 
simplicity  in  this  section  1 will  assume  that  the  outcome  of  a decision  can  be 
assessed  on  a value  scale  which  is  linear  in  probabilities,  i.e.,  a utility 
funct ion . 


Referring  back  to  Fig.  1 in  the  Introduction,  a decision  can  be  char- 
acterized by  a set  of  actions  A^,  a set  of  events,  , and  a matrix  of  out- 
comes Given  a utility  function  ^(0^^)  ■ and  a probability  assignment 

R(E  ) ” R.,  the  expeited  utility  for  action  A , U(A,)  = U , is  just  ^ R.U.  . 
Jj  111  jJ^J 

Tfie  decision  rule  normally  associated  with  this  analysis  is,  cliouse  the  action 


Aj  that  maximizes  U^.  We  first  show  that  any  decision  matrix,  with  the 
maximlze-expected-utillty  decision  rule  Is  a proper  scoring  rule  for  the 
probability  estimate  R.  Define  U*(R,j)  as  the  for  the  action  A^  that 
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maximize 


s E RjUj..  By 


def inition 


E R.U*(R.j) 
j ^ j 


(9) 


(9)  is  precisely  of  the  form  (6)  with  S(R,j)  = U*(R,j). 

It  is  thus  possible  to  apply  most  of  the  general  properties  of  proper 
score  rules  derived  in  the  previous  section  to  any  decision  matrix.  How- 
ever, most  decision  matrices  are  not  normal.  Thus,  Theorems  4 and  5 do  not 
hold  in  general  for  decisional  score  rules. 

The  converse  of  the  statement  that  any  decision  matrix  defines  a proper 
score  rule  also  holds;  i.e.,  any  proper  score  rule  can  be  represented  by  a deci- 
sion matrix.  This  follows  trivially  if  the  actions  are  defined  to  be  the 

report  of  a probability  distribution  R on  a set  of  events 

defined  as  S(R,j).  The  triviality  is,  of  course,  that  the  optimal  action  for 
any  assignment  R of  probabilities  to  the  events  is  just  R itself. 

A somewhat  more  revealing  exposition  of  the  converse  can  be  nuide  if  we 
start  off  with  a general  decision  space  X,  i.e.,  X describes  the  potential 
actions.  Consider  any  set  of  functions  fj^(x),  1 = 1,  . . . , m.  This  set  of 
functions  defines  a proper  scoring  rule  for  a probability  estimate 
R ” (R,  ,...,R  ) under  the  rule  select  the  x such  that  EK.f.'^x)  is  maximized. 

m j j , 

This  is  really  just  another  way  of  saying  (9),  witli  f^(x)  = ^’xi ' 
fj^  are  differentiable; 

1 ^ 3x  ^ 

J 


Call  the  solution  to  this  system  of  equations  x*(R) , the  x that  maximizes 
E Rj  fj(x)  given  R.  Then  f^(x*(R))  ■ g^(R)  is  the  proper  score  rule  defined 

by  the  set  of  functions  f^.  If  the  f^^  are  initially  a proper  score  rule. 


gj(R)  - fj(R). 
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For  example,  if  X is  two-dimensional,  so  that  it  can  be  specified  by  the 
single  parameter  x.  O'  x "■  1,  and  f ^ (x)  = /x,  f2(x)  = /T-x,  differentiating 

and  performing  the  substitution  gives  g(R)  = r/\^R^  + (1-R)^  — the  spherical 

■> 

scoring  rule.  If  f (x)  = x*",  g(R)  is  the  Brier  scoring  rule.  Since,  on  this 
ai>proach,  there  are  no  restrictions  on  the  form  of  the  functions  f^  (other 
than  differentiability),  it  is  convenient  way  to  derive  a wide  variety  of  informa- 
tional scoring  rules  and  at  the  same  time  obtain  some  insight  into  the  nature 
ot  the  kind  of  decision  which  is  being  made  with  that  rule;  i.e.,  the  kind  of 
payoff  function  which  is  (implicitly)  being  maximized  In  applying  the  rule. 

A somewhat  more  intricate  application  of  the  approach  can  be  made  if 
the  notion  of  repetitive  decisions  is  introduced.  In  the  section  on  equivalent 
estimates  It  was  pointed  out  that  many  kinds  of  estimates  are  Iterated  in  a 
fairly  routine  manner.  In  effect,  this  is  a symptom  of  the  fact  that  many 
deci jions  are  iterated  rather  routinely.  For  simplicity,  consider  a proto- 
type decision  matrix  with  a fixed  number  of  rows  and  columns,  but  with 

varying  values.  We  can  conceive  of  each  such  matrix  as  a member  of  a sequence 
of  decision  problems  where  the  form  of  the  decision  remains  the  same,  but  the 
and  the  relevant  probabilities,  change  from  case  to  case.  For  each  case, 
the  decision  maker  selects  the  action  which  is  optimal  for  his  estimate 
R of  the  probabilities.  The  decision  function  will  map  the  space  of  matrices 
onto  a partition  U^,  where  is  the  set  of  matrices  for  which  the  action  1 is 
optimal.  Strictly  speaking  the  need  not  form  a partition  — for  some  matrices 
more  than  one  action  may  he  optimal;  but  for  simplicity  we  assume  that  such 
matrices  are  assigned  to  only  one  set  U . 
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If  we  assume  that  there  is  a joint  distribution  D(U)  on  the  space  of 
matrices,  and  assume  in  addition  that  the  decision  maker's  estimate  R is  inde- 
pendent of  this  distribution,  then  the  expected  payoff  is 


(10) 


Or, 


interchanging  the  summation  signs 

E R.  E /*  U D(U) 

j i •'  U.  J 

1 


(11) 


Thus,  we  can  set 
SCR,j) 


U^jD(U) 


(12) 


(12)  is  the  decisional  score  rule  defined  by  the  sequence  of  the  decisions 
with  "random"  matrices. 

As  a simple  example,  suppose  we  have  an  individual  who  engages  in  1 re- 
quent  bets  on  a binary  event,  e.g.,  wln-lose  types  of  bets  on  athletic  con- 
tests. Each  bet  has  the  decision  matrix 

E E 

1.  Bet  on  E (l-u)/u  -1 

2.  Bet  on  E -1  u/ (1-u) 

Where  the  outcomes  are  the  appropriate  odds  for  a bet  on  an  event  with 
probability  u.  Thus,  we  assume  some  one  offers  the  stated  odds  and  the  indi- 
vidual can  choose  which  side  to  bet  on,  with  a standard  bet  of  1.  Following 
the  maximization  rule,  the  individual  would  bet  on  E if  his  subjective  ptob- 
abllity  for  E is  greater  than  u,  otherwise  he  would  bet  on  E. 

We  obtain  a strategically  equivalent  matrix  If  we  add  1 to  each  entry, 
giving  the  matrix 
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I-. 


F 


I . Oft  iin  i;  1 0 


J.  Bft  on  F 0 
Now  suppose  tile  individual  is 
tlie  distribution  oi  u in  the 


0 *■  u S R,  and  U2  is  R < u < 


S(R,1)  = / D(u)/u 

A 


i/(!-u) 

piesenteJ  vitli 
sequence  is  D(u) 
1.  Since  V ^2  “ 


a sequence  of  such  bets,  where 

In  this  case  U,  is  lust 
1 

^21  ~ have  from  (12) 


S(K,2) 


D(u)/(l-u) 


(13) 


Thus,  the  gambling  sequence  generates  a variety  of  score  rules  depending  on 
D(u).  For  example,  if  D(u)  = constant,  the  logarithmic  score  rule  ensues. 

If  D(u)  = k.u(l-u),  the  spherical  rule  Is  generated,  and  so  on. 

The  sequential  model  shows  that  a score  rule  may  look  very  different 
derived  from  a single  decision  compared  with  one  derived  from  a sequence  of 
similar  decisions.  The  Informational  score  rules,  which  may  seem  irrelevant 
in  the  case  of  a specific  decision  can  make  a great  deal  of  sense  if  the 
given  decision  is  embedded  In  a sequence. 

From  this  point  of  view,  the  logarithmic  score  rule  would  be  appropriate 
for  the  "complete  Ignorance"  situation  where  any  given  "opportunity"  — the 
betting  odds  parameter  u — is  as  likely  as  any  other.  The  spherical  rule  would 
be  appropriate  if  the  distribution  of  "opportunities"  is  peaked  about  u » -I-, 
with  relatively  few  at  the  extremes,  and  so  on.  For  many  of  the  simpler 
kinds  of  decisions  — especially  for  betting  decisions  — the  distribution  of 
opportunities  can  be  obtained  empirically.  For  such  cases,  the  appropriate 
score  rule  could  be  computed  from  the  data.  An  Investigation  of  this  topic 
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would  shed  some  light  on  the  role  of  various  informational  score  rules  in 
guiding  decision  procedures.  This  point  will  be  expanded  in  Chapter  V in 
discussing  the  evaluation  of  group  probability  judgments  by  informational 
score  rules. 

Decisional  score  rules  have  been  called  "piece  of  the  action"  rules  by 
Savage,  in  the  context  of  rewarding  a consultant  for  his  advice.  One  way  to 
reward  a consultant  is  to  pay  him  a certain  proportion  of  the  profit  nis 
advice  creates  for  the  enterprise.  Raiffa  calls  decisional  score  rules 
"naturally  imputed"  rules,  on  the  grounds  that  they  derive  directly  from  a 
decision  problem.  Some  of  the  issues  implied  by  these  terms  will  be  taken 
up  in  the  next  section  on  scores  as  motivators. 

5 . Motivational  Role  Sj^ores 

The  basic  emphasis  in  this  t;hapter  has  been  on  scores  as  measures  of 
excellence,  essentially  accuracy,  as  measured  by  the  distance  from  the  true 
answer,  or  less  directly,  the  regret  — the  difference  in  expected  utility 
achievable  by  the  correct  judgment  and  a given  estimate.  Lurking  in  the 
background  has  been  the  notion  that  scores  also  can  act  as  motivators;  an 
individual  will  tend  to  maximize  his  expected  score.  Thus,  it  is  usually 
thought  that  school  grades  are  both  a measure  of  tlic  performance  of  a stu- 
dent, and  also  a spur  to  better  performance.  In  many  traditional  economic 
texts,  wages  are  treated  primarily  as  rewards  which  motivate  workers  to  perform 
requisite  tasks.  The  boundary  line  between  scores  and  rewards  (or  lelulorc- 
Ing  agents,  to  use  the  Skinnerian  terra)  is  tlius  quite  fuzzy. 

Wliat  brings  this  topic  alive  Is  the  possibility  that  a given  scoring 
scheme  may  backfire.  Wliat  appears  to  he  a completely  reason.'iblo  measuri-  of 
excellence  can  Induce  beliavior  that  is  quite  contrary  to  what  was  inteiulefi  in 
formulating  the  score.  Suppose  there  is  a specific  kind  of  response  Q which 
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Is  desired.  Consider  a function  S wliich  rewards  an  individual  with  S(R,T) 
il  the  Individual  responds  with  R and  criterion  T applies.  Such  a function 
will  motivate  the  individual  to  perform  ()  if  his  anticipated  reward  F(Q,R,T), 
computed  1 rom  S(R,T),  Is  a maximum  when  R = Q.* 

An  example  might  clarify  the  role  of  these  two  functions.  The  follow- 

12 

iiig  example  is  due  to  Marschak.  Suppose  we  have  an  individual  who  has  an 
aiticle  for  sale— e.g.,  a house.  He  sets  some  value  on  this  article,  which 
we  may  as  well  think  of  as  its  worth  to  him  in  money.  Most  bargaining 
procedures  tend  to  motivate  the  individual  (initially  at  least)  to  set  a 
higher  price  on  the  article  than  its  worth  to  him.  Can  an  exchange  process 
be  designed  that  would  motivate  the  individual  to  reveal  his  "true  price"? 
file  following  scheme  will  do  it.  The  potential  buyer  submits  a sealed  bid,  T. 
Without  knowing  the  bid,  the  seller  announces  a price,  R.  The  bid  is  then 
compared  witli  the  price.  tf  the  price  is  less  tlian  or  equal  to  the  bid,  the 
exchange  is  made  at  the  bid  amount.  In  this  case  the  desired  behavior  is 
announcing  the  true  price,  Q.  S(R,T)  = T if  R < T,  otherwise  0.  F(Q,R,T)  = 

T - t)  if  K • T,  otherwise  0. 

Figure  26  displays  the  situation.  Potential  asking  prices  R define  the 
vertical  scale  and  potential  bids  T define  the  horizontal  scale.  To  the  left 
of  the  45°  line,  R < T,  no  exchange  takes  place,  and  S(R,T)  = 0.  Similarly, 
F(Q,R,T)  = 0 in  this  region.  in  region  A,  F is  negative.  In  the  region  to 
the  right  of  the  45°  line,  and  beyond  T “ Q,  F is  positive,  and  independent  of 

★ 

if  tills  scheme  were  to  be  taken  seriously  as  a practical  technique  for 
eliciting  the  response  Q,  a number  of  other  conditions  would  be  Imposed  on 
the  functions  S and  F.  Among  these  would  be  that  S represents  the  total 
"sura  of  rewards"  involved  in  the  situation;  that  F is  apparent  to  the 
individual;  that  the  maxiraura  is  not  too  flat;  that  the  reward  is  appropriately 
tiraed  with  respect  to  the  behavior;  and  so  on.  However,  for  purposes  of 
theoretical  investigation,  the  usual  issue  is  the  appropriateness  of  a given 
response  R,  not  the  probability  tliat  it  will  be  elicited  in  fact.  Thus, 
these  practical  conditions  are  coraraonly  omitted. 
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Figure  26.  Bidding  Procedure  to  Motivate  Honesty 


u. 


Thus,  for  any  T,  F(Q,K,T)  < F(C},(^,T).  The  sel  li-r  cannot  lose  if  he  announces 


<• 


Q,  and  he  may  lose  If  he  announces  an  R ^ Q.  In  standard  parlance,  announcing 
Q dominates  any  other  announcement. 

The  example  of  a voting  scheme  described  in  Chap.  VI,  Sec.  5 employing 
random  selection  of  a final  slate,  is  another  illustration  of  a scoring  scheme 
which  generates  a desired  kind  of  estimate  — in  this  case  the  honest  rating  of 
.a  candidate. 

An  informative  example  of  a presumed  proper  scoring  scheme  which  fails 
rather  miserably  can  be  found  in  test  grading.  It  is  common  practice  in  scor- 
ing objective  examinations  to  do  what  is  called  "correcting  for  guessing." 

The  basic  assumption  is  that  the  student  can  get  the  right  answer  by  guess- 
ing at  least  50%  of  the  time.  To  discourage  the  student  from  guessing, 
his  recorded  score  is  computed  as  C - W/(a-l),  where  C is  the  number 
correct,  W is  the  number  incorrect,  and  a is  the  number  of  alternative 
answers  to  the  question.  For  true-false  questions,  the  score  is  just  C - W 
(rights  minus  wrongs).  The  speech  which  accompanies  this  score  goes  like 
this:  if  the  student  guesses,  and  guesses  wrong,  he  will  be  "punished"  by 

subtracting  a point  from  his  score. 

It  is  a simple  exercise  to  show  that  this  scheme  is  futile  (providing 
the  assumption  on  which  it  is  based  is  correct.)  Suppose  the  probability 


that  a given  student  will  give  the  correct  answer  on  question  j is  q^.  His 
expected  score,  if  he  responds  to  every  question  is  ^ (q  - (1  - q,)) 

j ^ ^ 

“ 2 " n,  where  n is  the  number  of  questions  in  the  test.  If  he  res- 

J ■’ 

ponds  to  only  m of  the  questions  (for  whatever  reason) , his  expected  score 
m n 

is  2 q - m.  The  difference  between  the  two  scores  is  2 q - (n-m) . 

I J j-m  J 
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Since  the  basic  assumption  is  that  q . > , 2 ^ q.  > n-m,  and  the  difference 

^ j=m  J 

is  positive.  Thus,  the  student  never  loses  (on  the  average)  by  guessing.* 

In  the  following  chapter  we  will  see  that  the  assumption  tliat  the  student 
always  has  an  expectation  of  at  least  ^ in  guessing  tiie  answer  to  a question  is 
just  false,  and  that,  in  fact,  this  myth  covers  a major  gap  in  the  theory  of 
test  design.  For  the  present  discussion,  however,  the  moral  is  that  a scoring 
system  which  is  in  widespread  use  and  which  is  intended  to  motivate  a kind  of 
behaviour,  simply  doesn't  do  its  job. 

Many  Bayeslans  who  object  to  the  notion  of  "correct  probability"  for 
single  events,  prefer  to  approach  the  theory  of  probabilistic  scores  from  the 
aim  of  motivating  the  estimator  to  be  honest.  Thus,  if  the  individual  believes 
Q is  the  probability  distribution  on  a set  of  events,  he  might  be  motivated  to 
report  something  different,  depending  on  how  he  is  rewarded  for  his  estimate. 
The  condition  on  the  score  tiiat  will  motivate  honesty  is  just 


L Q.  S(R,j)  < ^ Q.S(Q,j) 
j J j J 


That  is,  the  individual's  subjective  expectation  is  a maximum  when  he  reports 
his  believed  probabilities.  Fourteen  is  formally  identical  to  >,  witli  Q 
replacing  P.  Hence  lA  leads  to  the  same  family  of  scoring  iiies  as  6. 

It  is  easy  to  formulate  probability  scores  whicii  app.-ar  reasonable, 
but  which  violate  lA.  If  S(R,j)  * Rj , l.e.,  the  score  is  just  the  reported 
probability  of  the  event  that  occurs,  several  intuitively  reasonable  condi- 
tions are  met.  The  score  increases  with  the  reported  probability.  H(R)  is 
convex.  In  fact  H(R)  ■ » which  is  identical  to  H(R)  for  the  quadratic 

j ^ 

* 

Of  course,  he  may  Increase  the  variance  of  his  score  by  responding  to  ques- 
tions where  is  close  to  'j.  But  I have  yet  to  see  a justification  for  the 

procedure  which  Invokes  a trade-off  between  expected  score  and  variance. 


scorln^j  rule.  Nevertlieless , S(K,j)  = does  not  fulfill  14,  and  it  is 
easy  to  see  that  If  this  scorlnt;  rule  Is  used,  the  individual  will  always 
report  either  0 or  1 for  tlie  probability  of  an  event.  Ills  expected  score 
')  . Q.S(K,j)  = ^ Q.R.  is  maximized  wiien  he  reports  1 for  the  event  with 

j ^ J ^ ■ 

maxiraum  Q and  0 for  all  the  otliers. 

2 

A somewliat  more  amusing  case  is  S(R,j)  = R^ . This  might  look  at  first 

sigiit  like  a variant  of  the  quadratic  score.  However,  the  expression 

2 2 

QR  + (l-Q)d-R)  is  a maxiraum  precisely  when  R = (1-Q) ; the  score  motivates 
the  individual  to  report  the  "opposite"  of  what  he  believes! 

Tiie  non-intui tive  nature  of  these  "anomalies"  suggests  a second  look 
at  the  distance  scores  introduced  In  Sect.  1.  Altliough  from  the  standpoint 
of  measuring  discrepancies  between  a report  and  a true  answer  they  appear 
impeccable,  wiiat  can  be  said  for  them  from  the  viewpoint  of  motivating  res- 
ponses? 


A subject  answering  a question  in  the  laboratory  with  the  Instuctions 
"make  as  good  an  estimate  as  you  can,"  must  be  guided  by  some  rough  idea  as 
to  what  the  experimenter  thinks  is  a "good  answer."  Or  lacking  any  guidance 
in  the  instructions,  he  c^in  only  proceed  on  what  he  thinks  is  a good  answer. 
By  now  it  should  be  clear  that  there  is  no  well  defined  content  to  the  term 
"good  answer."  Suppose,  for  example,  that  the  subject  thinks  that  the  pro- 
portional score  is  reasonable  — he  would  like,  if  possible,  to  make  a small 
percentage  error.  Urns , in  a loose  way,  he  is  trying  to  minimize  ]r  - t|/T. 
If  we  assume  !ie  has  a subjective  probability  distribution  D(T)  on  T,  then  he 
will  seek  to  minimize  his  expectation  of  his  proportional  error;  that  is,  he 
will  try  to  minimize 
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with  the  integral  extending  over  his  subjective  range  for  T.  If  t fie  minimum 

* 

is  computed,  it  turns  out  to  occur  at  R wliere 


R*  could  be  called  the  proportional  median.  It  is  an  unusual  statistic. 
Generaliy,  it  is  quite  small,  smaller  even  than  the  harmonic  mean.  f’lr  example, 
if  D(T)  is  a uniform  distribution  between  a and  b,  then  R*  = v^ib . Suppost' 
a = 10,  b = lUO.  The  mean  of  this  distribution  is  55,  tlie  geometric  mean  i-. 
47.5,  the  harmonic  mean  is  39.01,  and  R* , the  proportional  median  is  31. b2. 

If  the  subjects  in  this  experiment  were  realistic  in  the  sense  that  tlie  aver- 
age of  their  subjective  probability  distributions  correspond  ratluT  well  with 
the  true  answers,  then  a large  majority  oi  Lheir  .mswers  wouhl  appear  to  be 
unde  rest imat  es . 

In  an  extensive  series  of  experiments  at  the  Rand  Corporation,  witii 

college  student  subjects  and  general  information  type  questions,  a majority  ol 

1 3 

the  responses  were  in  fact  underestimates.  The  65th  percentile  was  .i 
better  estimator  of  the  true  ansts  r than  either  the  mean  or  the  geometric 
mean . 

The  assumption  that  the  subjects  were  expressing  thi'  pioportional  neilian 
of  their  subjective  probability  distributions  appears  a little  too  drastic. 
Suppose  we  invoke  the  theory  of  errors  model  and  make  the  following  assump- 
t ions ; 

(1)  The  subjects  have  a subjective  probability  distribution 
■ D(T)  on  the  potential  answers. 

(2>  Following  the  psychonumcr ic  hypothesis,  D(T)  i:; 
log  normal. 
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O)  rhe  subjet’Ls  an*  roughly  realistic 


i.c*.,  tlu‘  actual  answer  A = 


c 


/ lop  TI)(T) 


(.4)  Till'  iiul  i V i ilii.i  1 oviiliiiitcs  his  answors  with  Lhf  error-squared 

/(  R—  T ) ^ 

— h (T ) . 

The  last  asi.uiiipt  ion  implies  that  tlie  Individual  will  respond  with  the 
liar-onli'  mean  ot  liis  distribution.  The  harmonie  mean  of  a lop  normal  distrlbu- 


t ion  can  he  computed  readily,  it  is  ^ where  as  in  Chapter  11,  Sec.  7 
11  is  the  mean  of  the  lop  transform  distribution  and  a is  its  standard  devia- 
tion. Thus,  the  harmonic  mean  occurs  at  the  31st  percentile.  This  is  to  be 
compared  with  the  observed  35th  percentile.  Restating  the  point,  in  the  Rand 
data,  about  65%  of  the  responses  were  underestimates.  On  the  present  tlieory, 
about  69%  would  be  underestimates.  Considering  the  fact  that  the  Rand  data 
III'  liased  on  a variety  of  sample  sizes  (ranging  from  13  to  29)  on  different 
fpiestions,  and  that  we  art*  smearing  the  averages  over  a large  population  with 
.1  small  number  ot  tpiestlons,  tlie  figure  docs  not  appear  out  of  line. 

The  data,  as  .inalyzed  is  not  sutficient  to  assert  with  high  condidence 
I fiat  tne  suli  jects  were  indeed  responding  with  the  harmonic  mean  of  their  sub- 
jective probability  distributions.  Iiowever,  the  results  are  hignly  suggestive. 
In  particular,  they  suggest  that  possibly  much  of  tlie  apparent  bias  observed 
in  probability  estimates  could  be  due  to  completely  reasonable  behavior  on 
the  part  of  the  subjects.  They  may  be  responding  to  an  implicit  scoring  rule 
which  has  consequences  that  the  experimenter  has  not  anticipated. 

In  the  present  context,  we  can  examine  the  implied  response  for  a 
variety  of  scoring  rules.  These  are  obtained  by  minimizing  on  R the  integrals 


/ 


S(R,T)f)(T) 


I 
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Minimum  oxpecttnl  score  rt^sponses  for  various 


(maj^nitude  estimation)  scores 


|R-T1  Median 

Mean 

Harmonic  Median 
Harmonic  Mean 
Geometric  Median 
Geometric  M<'an 

If  we  apply  the  preceding  type  of  analysis  to  probabilistic  ,ioies  we 
arrive  at  a surprising  result.  Consider  an  individual  who  lias  a subjective 
distribution  i)(P)  over  the  range  K of  possible  objective  probabilities  1'.  Hi: 
expected  score,  for  response  R will  be 


2.  (R-T) 

3 . 1 R-T I /T 

4.  (R-T)^/T 

5.  I Log  R - Log  T| 

6 (Log  R - Log  T)' 


HPjS(R,j)D(P)  = ^ PjS(R,j) 


(IS) 


where  P^  designates  the  average  of  the  According  to  6,  IS  is 


maximized  when  R “ P.  This  result  is  independent  of  the  kind  oi  score  .nui 
follows  from  the  linearity  of  C(P,R)  in  P.  This  result  becomes  of  importance 
when  one  tries  to  use  probabilistic  scores  as  a method  ot  claritying  t lie  notion 
of  uncertainty,  expressed,  e.g.,  as  a higher  level  distribution  on  the 
est imates . 

An  instructive  application  of  these  ideas  can  be  found  in  the  analysis 
of  the  payment  of  a consultant.  Suppose  an  expert  has  been  liired  by  .in  Inter 
prise  to  furnish  Inputs  to  a decision  problem.  i'lieorem  7 states  th.it  the 
expert  will  collect  whatever  additional  inform.it ion  is  available,  i.e.,  within 
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t lu'  brundh  o(  t (MS  i b i 1 i t y , Iw  will  try  to  nakt-  liimsull  mor<-  export  on  tlio 


specllii'  probUm  taoiti)’,  tbt>  enterprise.  U'li.iti'Ver  t lie  furm  of  paytnent  , it  will 
be  to  his  .idvant.ige  to  become  more  knuwledp.eable. 

However,  suppose  the  relat ionsliip  between  the  consultant  and  the  enter- 
prise is  of  one  fairly  common  sort,  where  the  expert  is  expected  to  help  struc- 
ture tlie  problem  that  is,  he  is  expected  to  give  advice  concerning  the  rele- 
vant events  to  be  taken  into  account,  potential  actions,  and  the  like.  To 
simplify  the  point,  suppose  the  consultant  is  merely  asked  to  suggest  the 
relevant  events,  and  of  course,  furnisli  a probability  assignment  for  them.  if 
tiie  consultant  is  paid  according  to  one  of  the  informational  score  rules,  then 
it  will  be  to  liis  advantage  to  make  the  list  of  events  as  small  as  possible. 

Put  in  informal  language,  it  will  be  to  his  advantage  to  learn  as  much  as 
possible,  and  to  tell  his  client  as  little  as  possible!  This  results  from 
the  fact  that  informational  score  rules  are  roughly  speaking  monotonic  in  the 
probabilities.  By  choosing  a smaller  event  list,  the  consultant  raises  the 
probabilities  of  the  predicted  events,  and  lienee  his  expected  score.  For 
example,  if  the  expert  is  a geologist  who  is  asked  to  forecast  tut  probability 
of  an  earthquake  in  Southern  California  over  the  next  twenty  years,  and  he  has 
t lie  choice  between  estimating  tlie  probability  of  eartliquakes  in  a number  of 
magnitude  classes,  or  of  a simple  dichotomy  like  major  or  minor,  then  an 
informational  score  rule  would  motivate  him  to  select  tlie  second  forecast.* 


In  actual  practice,  rewards  for  expert  judgment  are  highly  complex.  A con- 
sultant may  prize  his  reputation  more  than  money,  .md  reputation  may  depend 
more  on  precision  of  estimates  than  on  accuracy.  Of  course,  complex  reward 
situations  of  this  sort  may  not  be  a proper  scoring  situation;  the  export 
may  find  his  greatest  reward  in  lying. 


1 

i 

# 
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Decisional  score  rules  do  not  luive  the  disadvantage  jvist  disc\issed. 
Since  the  decisional  score  rule  depends  on  the  decision  matrix  as  well  as  on 
the  estimated  probabilities,  there  is  no  gain  in  simplifying  the  event  list. 
In  fact,  for  those  cases  where  refinement  of  tlie  event  list  can  i» ad  to 
increased  expected  payoff,  if  the  consultant  is  rewarded  with  a "piece  of 
the  action,"  he  will  be  motivated  to  generate  the  appropriate  refinement  of 
the  events. 


I 
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CMAI’TKR  IV.  NOMINAL  JUD' :‘n;tJTS 


1.  lliL'  SptH'truni  of  Ihicc' rj^u  Lilt  V 

In  1‘h.iiiti‘r  II  several  theories  of  iiulivlilual  estimates  were  expounded, 
and  in  Chapter  III  a variety  of  ways  of  measuring  the  excellence  of  such 
e.stimates  were  explored.  Underlying  both  chapters  was  the  assumption  that 
human  judgment  can  be  used  as  a surrogate  for  data  or  theories,  at  least  in  a 
decision  context.  Experimental  investigation  of  human  judgment  has  shown 
that  it  is  frequently  very  bad.  A basic  issue  to  be  explored  in  this  chapter 
is  the  question  whether  there  arc  situations  in  whicli  it  is  preferrahle  to 
replace  luiman  judg.ment  witli  something  that  is  in  a sense  even  weaker,  n.inely 
a form  of  estimate  that  can  be  called  a nominal  judgment. 

By  a nominal  judgment  is  meant  an  assertion  based  on  a qu.asi-logical 
rule,  such  as  the  principle  of  indifference  or  the  law  of  insufficient  reason. 
Typical  examples  are  tlie  assumption  of  equal  a priori  probabilities  in  some 
kinds  of  statistical  inference,  or  the  equal  weighting  of  individual  responses 
inherent  in  using  a simple  average  to  express  the  group  response  in  a Delphi 
exercise.  Estlrmates  of  this  sort  are  clearly  a third  kind  of  jiulgment, 
very  different  in  origin  from  statements  based  on  "hard"  data,  or  on  "intuition. 
They  have  led  a sort  of  demi-monde  existence  in  the  past,  frequently  employed, 
but  not  witli  a clear  conscience. 

Before  attempting  to  probe  the  foundations  of  this  sort  of  judgment,  it  is 
useful  to  look  at  a related  topic,  namely  the  range  of  excellence  that  esti- 
mates can  exhibit.  We  can  display  this  range  with  a crude  scale 

0 1 
Ignorance  Opinion  Knowledge 

Solidity  ► 
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The  quantity  bein;;  scaled  here  is  roughly  the  degree  ol  verification,  or  the 
amount  of  evidence,  that  exists  for  a statement.  Because  there  have  been  many 
attempts  to  generate  a formal  definition  for  this  scale,  none  of  which  to 
my  knowledge  have  succeeded,  1 prefer  to  leave  the  notion  informal  and  call 
the  quantity  "solidity."  Tlius,  judgments  at  tb.e  right  end  of  the  scale  are 
well-established  and  solid,  those  at  the  left  end  are  unsubstantiated  or 
"flimsy." 

At  the  far  right  are  the  well  verified  generalizations  of  natural  sci>nc> 
and  the  equally  solid  generalizations  of  conunon  sense,  like  "Unsupported 
objects  fall,"  or  "Food  Is  necessary  to  sustain  life."  At  the  tar  left  are 


statements  for  which  tliere  is  evidence  wliatsoever,  but  equally,  no  contrary 
evidence.  In  Chapter  II,  it  was Vointed  out  that  it  is  not  clear  whether 
statements  witii  zero  solidity  exisiVitlier  tlian  in  theory,  dust  to  uiuk'rsfand 
a sentence  probably  requires  some  rel>vant  information.  There  is  an  extensive 
literature,  beginning  at  least  as  early  as  Hume,  emphasizing  the  fact  that  tlu' 
ideal  of  a completely  certain  factual  statement  is  just  that;  all  empirical 
statements  are  incompletely  verified.  In  a similar  way,  the  notion  of  a 
statement  for  which  no  relevant  information  exists  is  probably  an  ideal izat ion . 

In  the  middle  of  the  range  are  Vssert  Ions  for  which  there  i ..  somi-  evideiue, 
but  not  enough  to  inspire  high  confldeiue.  I have  called  this  range  "opinion." 
The  term  doesn't  seem  to  liave  caught  on  fu\  tills  appllvaiion,  possibly  because 
of  its  negative  connotations. 

At  all  events,  it  is  tliis  middle  range  which  appears  most  appropriite  ti'r 
applying  the  theories  of  estlm.it  Ion  of  Chapter  II. 

It  would  be  an  enormous  step  forward  in  the  theory  of  estimation  il  then* 
were  a well-defined  measure  for  solidity.  Attempts  to  deline  the  ccincept  baseil 
on  furm.il  logic,  such  as  Keynes'  logical  theory  of  probability^  and  Carnap's 


i-,t  forts  to  spocity  ,i  degroc  of  confirmation  liave  proven  unsatisfactory. 

Attempts  to  equate  the  scale  with  probability  i un  afoul  of  a number  of 
pri'blems,  some  of  wlifch  will  be  discussed  below  in  the  section  on  uncertainty, 
riu'  scale  is  clearly  related  to  the  notion  of  degree  of  confidenoe,  defined  in 
various  ways  in  statistics,  but  most  of  these  require  special  assumptions 
(sucli  as  randomized  sampling)  which  limit  theii  application  to  instances  of 
lormal  data  collection. 

One  relatively  obvious  tactic  is  to  use  a judgmental  scale  of  solidity; 

e.g.,  to  elicit  a confidence  rating  along  with  each  estimate.  The  present 

evidence  suggests  that  individuals  are  no  better  at  estimating  the  solidity 

of  a judgment  than  they  are  at  making  the  original  estimate.  The  illusion  of 

3 

certainty  phenomenon  studied  by  Slovic,  Lichtenstein,  and  others  is  a clear 
case  in  point.  In  the  series  of  experiments  with  college  students  and  almanac 
questions  described  in  Chapter  II,  the  subjects  were  asked  to  rate  their 
answers,  usually  on  .1  scale  from  1 to  5,  where  1 meant  "I'm  just  guessing" 
and  5 meant  "I  know  the  answer."  The  correlation  between  these  self-ratings 
and  log  error  was  -.25.  The  negative  sign  is  in  the  right  direction,  but  the 
size?  of  the  correlation  is  not  impressive. 

I'or  high  confidence  statements  (knowledge),  there  is  no  basic  difficulty. 
Die  rule  for  using  such  statements  in  decisions  is  simply  a.ssume  thje^  stajemtMU 
Is  true,  and  act  accordingly.  For  statements  in  the  middle  range  (opinion) 
there  are  a number  of  open  issues,  but  the  rule  nuike  the  best  estimate  you  can 
and  act  accordingly,  appears  to  have  general  acceptance.  In  this  range, 
expressing  uncertainty  with  estimated  probabilities  is  gaining  credibility  among 
decision  analysts.  Serious  conceptual  problems  arise  in  formulating  rules  for 
incorporating  statements  at  the  low  end  of  the  solidity  scale  (ignorance) 
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in  decisions.  IL  is  statements  of  this  sort  for  which  nominal  juds?,raents, 
rather  than  estimates,  are  probably  appropriate. 

2.  Uncertainty  Pyobabl^]  ity 

There  is  a long-standing  controversy  concerning  the  question  wiiether 

probability  completely  encodes  the  notion  of  uncertainty.  The  distinction 

between  uncertainty  (or  lack  of  information)  and  risk  (probability)  has  been 

4 

around  at  least  since  tlie  writings  of  Frank  Knight.  However,  there  has  been 
no  clear  consensus  on  the  subject  .imong  those  interested  in  decision  analysis. 
Those  who  favor  a subjectivist  theory  of  probability  Itave  been  inclined  to 
reject  the  distinction,  on  the  grounds  that  when  an  individual  makes  a proba- 
bility judgment  he  is  quantifying  liis  degree  of  uncertainty.  On  this  view, 
an  individual  who  says  "The  probability  of  event  E is  one-half"  is  saying 
"I  haven't  the  foggiest  notion  whether  E will  happen  or  not."  Some  objectivists 
have  also  rejected  the  distinction,  notably  Reichenbach  who  remained  convinced 
that  the  notion  of  probability  was  flexible  enough  to  cover  all  instances  of 
Incomplete  information. 

Objections  to  the  identification  of  uncertainty  with  probability  hive 
been  raised  by  Allais,  and  Ellslierg,  who  contend  that  the  tlieory  of  subjective 
probability  does  not  describe  tlie  behavior  of  individuals  making  choices  uiuler 
incomplete  information.  This  topic  will  1 e expanded  in  Section  6. 

J On  the  face  of  it,  the  subjectivist  position  is  hard  to  maintain. 

Consider  tlie  assertion,  "The  probability  of  event  E is  one-half."  As  an 
example,  suppose  there  are  two  coins,  one  of  wtiich  is  well-known  to  the 
estimator.  Let's  say  lie  has  flipped  it  many  times,  and  has  very  good  ri’ason 
believe  It  is  a fair  coin.  The  other  Is  an  e.xotic  object  with  which  he  lias 
had  no  prior  experience.  There  i.s  no  cont  rad  let  ion  on  the  sub  jei't  ivi  st 
theory  of  probability  to  suppose  th.it  he  asserts  "The  probability  of  heads  is 
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one-half"  for  each  of  the  coins.  And  yet  he  may  nuike  tfie  assertion  for  the 
first  coin  with  high  confidence,  and  the  assertion  about  the  second  coin  with 
little  or  no  confidence.  in  this  case,  the  ascription  of  probability  one-half 
to  an  event  occurs  with  two  very  different  states  of  knowledge. 

The  coin  example  sliows  that  a probability  judgement  cannot,  by  Itself, 
express  the  degree  of  certainty  with  which  the  judgement  is  asserted.  This 
point  is  in  accord  with  the  view  expressed  in  Chapter  II  that  probabilities 
are  properties  of  the  world,  not  of  the  estimator's  state  of  knowledge.  To 
pursue  an  example  raised  then,  an  estimate  of  the  height  of  a tree  can  be  made 
with  any  degree  of  solidity,  and  the  solidity  is  unrelated  to  the  number 
expressing  the  height.  Similarly,  a probability  estimate  can  be  asserted  with 
any  degree  of  solidity,  and  the  solidity  is  not  directly  related  to  the  numeri- 
cal probability. 

The  argument  appears  to  require  that  an  additional  index  other  tlian 
probability  be  formulated  to  express  the  solidity  scale.  In  statistics,  it 
is  common  practice  to  attach  a number,  the  significance  level,  to  a derived 
statistic.  In  scientific  applications  the  practice  is  to  "suspend  belief" 
if  the  significance  level  is  too  low.  This  scientific  procedure  is  not  much 
help  to  a decision  maker  if  the  statistic  is  relevant  to  a pending  decision,  and 
the  significance  level  is  below  the  accepted  criterion.  A low  significance 
level  does  not  imply  that  the  opposite  of  the  hypothesis  has  a high  significance 
level . 

The  significance  level  is  a special  case  of  one  suggestion  for  extending 
the  notion  of  probability  to  include  uncertainty,  namely  to  introduce  proba- 
bilities of  a higher  level.  Thus,  associated  with  any  given  estimate  we  can 
conceive  of  a probability  distribution  for  that  probability.  If  tl»e  second 
level  distribution  has  a low  dispersion,  the  estimate  is  relatively  solid. 
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If  the  upper  level  distribution  is  flat,  then  the  estimate  Is  uncertain.  Tliis 
suggestion  is  illustrated  in  Figure  27.  The  second-level  distribution  tor  the 
familiar  coin  is  peaked  around  one-half,  while  tiie  second-level  distribution 
for  t)>e  unfamiliar  coin  is  relatively  uniform. 

Second-level  probability  distributions  clearly  do  not  eliminate  tlie 
problem,  because  the  second  level  distribution  may  also  be  uncertain.  laken 
seriously,  the  suggestion  then  leads  to  a third  level,  and  so  on.  lin-  or  l ■ 
one  1 know  who  whole-heartedly  accepted  the  implied  infinite  set  of  higher 
level  distributions  is  Relchenbach. ^ Most  other  investigators  have  felt  that 
an  infinite  sequence  of  higher  level  distributions  creates  many  more  problems 
than  it  solves. 

Nevertheless,  the  notion  of  a higher  level  distribution  is  useful  lor 
exploring  some  kinds  of  relationships  between  uncertainty  and  probability. 

For  example,  if  we  assume  that  a higher-level  distribution  approximately 
expresses  uncertainty,  then  we  can  assert  a coupling  between  the  dispersion 
of  the  higher  level  distribution  and  the  probability.  This  comes  about  because 
the  range  of  probability  estimates  is  constrained  between  zero  and  one. 

Assume  that  the  individual  asserts  the  average  of  his  second-level  distribution 
as  his  first  level  response.  Then  it  is  impossible  to  assert  .1  probability 
close  to  one  with  high  uncertainty.  For  example,  if  the  upper  level  distri- 
bution has  the  form  (n+Op'^,  and  the  average  Is  .95,  then  n = lb,  and 
0 “ .OA75.  On  the  other  liand , If  the  average  is  around  .5,  then  the  seeond 
level  distribution  Is  not  constrained;  it  can  be  about  anything. 

More  generally,  consider  any  event  space  II.  If  U consists  of  a dlscreti 
set  of  events  (EjI,  then  the  totality  of  the  possible  probability  dlstrlhut ions 
on  U is  described  by  the  simplex  P , “ !•  The  individual  may  liave  a certain 
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amount  of  information  about  U,  which  can  be  expressed  by  sayinp,  that  he  knows 


f 


that  the  probability  distributions  on  U are  limited  to  a certain  cl,iss  K.  By 
assumption,  some  distribution  i’,  in  K is  the  true  distribution.  It  the  indi- 
vidual asserts  R = P,  then  Ivis  expected  score  will  be  H(P).  If  he  does  not 
know  P and  asserts  some  R ^ P,  then  his  loss  will  be  H(Pl  - G(P,R).  If  there 
is  an  upper  level  distribution  D(P)  on  the  distributions  in  K,  then  iiis 


expected  loss  will  be  I (n(P)  - G(P,R))  D(P). 


The  expected  loss  will  be  minimized  by  selecting  an  R*  to  give 


min  y*(H(P)  - G(P,R)J  D(P) 


(1) 


R K 

Since  H(P)  does  not  depend  on  R,  (1)  is  equivalent  to  finding  an  R*  whicli 
gives 


ax  y G(P,R)  D(P 


max 
R K 


) 


(2) 


Expanding  (2)  we  have 


/e  P.S(R*,j)  D(P)  = max  

K i J R K j 


lax  /£■  ’.S(R,j)  DO’ 
R 

And  since  S(R,j)  does  not  depend  on  P 

S/  P.D(P)  S(R,j 


= max 
R 


j) 


= max  Z) P«S(R, j ) (1) 

R 1 ’ 

where  P is  the  average  of  P over  K. 

From  the  definition  of  a proper  score,  R*  » P.  This  result  is  quite 
general . It  doe.s  not  depend  on  the  form  of  G(P,R)  , other  than  it  be  a proper 
score.  It  does  not  depend  on  the  nature  of  the  class  K,  other  than  that 
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(.(.r.K)  D(l’)  he  intot',r  iblf  over  K.  II  K 1h  lIk-  toi.il  •.implex,  ^ I’ . = 1,  and 

1 ' 

l'(l’)  is  t l\e  iiniiorm  d Lst  rlbiil  Ion , then  R*  = P,  the  nnllt)rm  d Istrllinl  Ion  on  U. 

We  e.an  i.ilL  the  prese  r Ipl  ion  .assert  P , the  minimum  loss  ru^l^  lor  est  i- 

m.ites  with  incompl«‘te  information.  It  i.s,  in  a way,  a p.enera  1 izat  ion  ol  the 

principle  of  indifference.  For  decisional  score  rules,  with  matrix  b\  . , it 

recommends  selecting  the  action  A.  such  that  is  a irvaximum.  For  the 

1 j ’ 

case  that  K is  the  total  simplex,  ^P.  = 1,  it  recommends,  in  effect,  selecting 

i 


tlie  action  with  the  highest  row  sum. 

Given  the  assumption  that  upper  level  distributions  are  a reasonable 
approxlimat ion  to  uncertainty  and  that  a uniform  upper  level  distribution  is 
a sufficient  description  of  complete  ignorance,  the  min  loss  rule  appears 
r.ither  inevitable.  It  has  a number  of  attractive  features  in  addition  to  those 
already  mentioned.  The  average  distribution  P is  relatively  easy  to  compute. 

If  a decision  matrix  has  extreme  values  — "catastroplies"  or  "windfalls"  — 
the  rule  neither  ignores  them,  nor  is  obsessed  with  them.  And,  to  anticipate 
the  next  section,  in  the  "complete  Ignorance"  form  — a uniform  upper  level 
distribution  on  K — the  rule  is  a hedge  against  bias.  However,  the  assump- 
tion that  upper  level  distributions  are  a sufficient  approximation  for 
incomplete  Information  remains  to  be  established. 


i.  CouiPt  erpredic  t ion 

A phenomenon  that  gives  some  .additional  Insights  into  the  nature  of 
uncertainty  is  counterpredict  ion.  Consider  an  Individu.ai  who,  if  he,  asserts 
I! , then  you  arc  uetter  off  to  believe,  not-R.  According  to  the  theory  of 
probability  expressed  in  Chapter  II,  there  Is  no  such  Individual.  Presumably, 
if  anyone  would  be  better  oft  to  believe  not-R,  then,  in  particular,  the 
individual  himself  would  be  better  off;  so  ho  also  should,  if  his  best  estimate 


i.s  R,  believe  not-R,  which  is  some  kind  of  contradiction. 


NeverLheless , there  is  Rood  evtderice  that  tlie  plienomenon  of  counter- 
prediction is  fairly  common.  In  the  ti^eory  of  psvcholoRlcal  test  const  rue  t ii’ti , 
there  is  .a  concept  called  the  difficulty  of  an  item.  The  difficulty,  for  a 
given  population,  is  defined  as  the  probability  titat  a member  of  that  popula- 
tion will  get  the  right  answer,  as  diagrammed  in  Figure  28.^  The  interesting 
feature  of  this  scale  is  that  it  covers  the  full  range  between  0 and  1.  For 
those  items  with  difficulty  greater  than  the  vertical  line  at  d in  Figure  28 
wliere  the  probability  drops  below  one  half,  a typical  member  of  tlie  population 
is  a counter-predictor;  you  would  be  better  off  to  reject  his  answer. 

The  second  feature  of  interest  for  this  scale  is  that  there  are  examples 
of  items  with  difficulty  greater  than  d.  And  tlie  classification  of  sucli 
items  is  well-defined  in  the  sense  that  if  they  are  scaled  on  a random 
sample  of  the  population,  then  a different  sample  will  exliibit  the  same  pro- 
portion of  individuals  wlio  get  the  correct  answer.  So  far  as  I know  there  is 
no  general  theory  for  such  questions  in  the  sense;  that  tliey  can  lie  idenlilied 
without  first  trying  them  out  on  a sample  of  respondents. 

The  third  interesting  feature  of  the  dilficulty  scale  is  that  tor 
questions  with  difficulty  greater  than  d,  the  individual  would  do  better  if 
he  readied  in  his  pockt t , drew  out  a coin,  and  flipped  it  to  obtain  his 
answer  (assuming  a true-false  question).  For  sucii  questions,  a good  fair  coin 
is  better  than  guessing.  This  is  the  hole  In  test  tlieory  tiiat  1 referred  to 
in  the  discussion  of  the  rights-mlnus-wrongs  score.  If  the  individual  could 
identify  those  questions  for  which  he  was  a counterpredictor,  lie  would  do 
better  by  relying  on  a chance  mechanism.  He  also  would  do  better  with  the 
rights-mlnus-wrongs  score  by  not  answering  such  questions,  since  his  expecla- 
t Ion  Is  negative.  However,  In  order  for  the  r I ght  s-ml  niis-wrongs  siore  to  be 
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offtictive,  the  individu.il  rausL  be  .ible  Lo  identify  Lliose  questions  for  wliicli 
he  is  a I'ountcrpred  ictor . 

There  have  been  a number  of  experiments  exploring  tliis  issue,  n.imely 
experiments  concerned  witli  the  realism  of  subjects  in  making  probability 
estimates.  The  data  of  Capen , Figure  9,  Chapter  II,  is  typical.  The  aver.ige 
quadratic  score  for  the  3160  responses  in  this  data  is  .55.  The  expected 
quadratic  score  for  a complete  ignorance  response  — i.e.,  for  a response 
of  .5  to  every  question  — is  .5.  Thus,  the  individuals  on  the  average  did 
a little  better  tlian  chance.  However,  for  the  roughly  AOZ  of  the  responses 
where  answers  of  .7,  .8,  and  .9  were  given,  the  average  S(^ore  was  .46, 
distinctly  worse  than  chance. 

Figure  29  is  a graph  of  the  expected  quadratic  score,  where  R is  the  indi- 
vidual's response,  and  P is  the  objective  probability.  Only  half  of  the  g.r.ipii 
is  presented,  since  the  otlier  h.alf  is  just  a mirror  im.age.  .5  has  been  sul)- 
tracted  from  the  values  to  display  the  difference  between  the  expected 
score  and  what  would  be  expected  from  the  response  .5.  Thus,  along  the  hori- 
zontal line  P = .5,  and  on  the  slanting  line  bounding  the  filled  in  area,  the 
difference  is  0.  It  is  no  surprise  that  the  expected  score  is  negative  if  tlu‘ 
individual  reports  a probability  greater  than  one  half  for  events  where  t Ite 
objective  probability  is  less  than  one  Italf.  However,  the  stippled  arc.i  sliows 
a region  where,  despite  tlie  fact  that  botli  the  report  and  the  objective 
probability  is  greater  than  one  half,  the  ind  ivitiu.i  1 ' s score  is  still  worse 
than  chance.  In  the  stippled  area,  the  Individual  is  a counterpredictor  in 
the  weak  sense  that  he  would  achieve  a higher  score  if  lie  said,  "1  don't 
know" — i.e.,  responded  with  .5. 

Figure  30  Is  a similar  graph  for  the  logarithmic  scoring  rule.  Again,  the 
stippled  area  is  the  region  where  both  the  report  and  the  objective  probability 
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Expected  Normdii/ed  Log  Score 

G(P.R)  = P log  R • (1  - P)  log  (1  - Rl  - log  0 5 


art-  jjreaLor  than  .5  ami  yot  t lu-  expecteil  score  is  less  th.in  the  v-omplete 
Isnorance  score.  .•M  though  the  (general  features  of  the  two  graphs  are  the  same, 
l Iiey  indicate  tliat  to  some  extent,  the  question  wlietliet  a >»iven  response  Is 
coiintcrpredict i ve  depends  on  the  score  rule.  For  example,  R = .75,  P = .95 
Is  counterpred ict ivc  for  the  logaritlimic  score  rule,  but  not  for  the  quadratic 
rule. 

Figure  31  shows  a similar  grapli  for  the  scientific  score  rule.  The 
[licture  is  quite  different  than  for  the  logarithmic  and  quadratic  rule.s.  There 
is  no  region  wltere  R and  P are  both  greater  than  .5  and  the  expected  score  Is 
less  than  for  R = .5.  Tlie  expected  score  Is  discontinuous  at  R = .5.  Finally, 
Figure  32  shows  the  expected  decisional  score  for  the  Inset  decision  matrix. 
Here  the  expected  payoff  for  any  R less  than  2/3  is  precisely  the  same  as  for 
R * .5,  hence  t l>e  nuimalized  score  is  zero  for  this  region.  Here  the  anomal- 
ous region  wh«Te  hot  li  P and  R ire  greal  er  than  .5,  but  the  normalized  score 
IS  less  f h 111  o .siiiM'le/f  i Mftennt  liuus  I h.in  tor  the  informational 

f.-  <*.•  ..jii..  »,  r*  I i.tf  li-'lne.'  f .ir  K = -^/i,  slme  the  decl.sion- 

itii.  ’ . ■ 1 ! ■>  , '*!•  nurahers  it  t. ached  to 

■>  . . • !'•  uiK.l1  in  .1  ratio  1/1, 

. i«-  I'.i  .iiil  .....re  rule  with 
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The  phenomenon  of  count erpredic t Lon  appears  to  offer  a relatively  sharp 
criterion  for  demarcating  the  area  of  ignorance  within  whicli  nominal  judgments 
are  to  be  preferred  to  intuitive  estimates.  Almost  tautologically,  if  a 
given  question  is  one  for  which  a given  individual  will  make  a counterpredicti 
estimate,  then  that  individual  would  be  advised  to  make  a "weaker"  estimate. 
There  are  several  considerations  which  keep  this  point  from  being  a pure 
tautology.  As  we  have  seen,  whether  or  not  a given  estlnuite  is  coiinterpred  ir- 
tive  depends  on  the  score  rule  employed.  .More  seriously,  it  depends  on  the 
form  of  the  nominal  estimate  with  which  it  is  being  compared.  For  tlie  example 
above,  the  comparison  was  with  the  "I  don't  know"  estimate  R = .5.  Although 
this  is  a well-known  and  intuitively  appealing  criterion  it  assumes  Chat  a 
uniform  distribution  is  the  proper  interpretation  of  "total  uncertainty"  or 
"total  ignorance."  A number  of  well-known  paradoxes  casts  doubt  on  this 
interpretation . 

4.  Paradoxes  of  Unlf^rra^t^ 

Traditionally,  lack  of  information  has  been  linked  with  the  notion  of 
uniformity,  e.g.,  the  well  known  "Bayesian"  assumption  of  uniform  prior  -'rcba- 
bllities  in  tlie  absen:e  of  further  knowledge.  Most  applications  of  rules  sucli 
as  the  principle  of  insufficient  reason,  or  the  rule  of  indifference,  wind 
up  with  uniform  distributions  on  some  event  space.  Tlte  traditional  foundation 
of  probability  bar.  d on  tlie  notion  of  "equ  i-poss  ib  le  cases"  — probability  is 
the  ratio  between  tlie  number  of  favorable  cases  to  all  possible  cases, 
assuming  they  are  all  equally  possible  — has  the  same  flavor. 

Applied  indiscriminately,  these  prescriptions  can  lead  to  simple 
paiadoxes.  Perhaps  the  simplest  is  the  fact  tliat  distributions  arrived  at  by 
these  rules  are  not  Invariant  over  different  ways  of  par t i t 1 t ioning  tlie  event 


space.  If  I am  completely  iRnorant  about  the  weatlier  tomorrow,  simple  applica- 
tion of  tlie  principle  of  indifference  would  give  probability  1/2  to  rain, 
and  probability  1/2  to  not-raln.  However,  I am  equally  ignorant  as  to 
whether  it  will  rain,  snow,  or  be  fair,  in  which  case  the  probability  of  rain 
is  1/3.  .Almost  any  probability  for  rain  less  than  1/2  can  be  derived  by  select- 
ing other  possible  partitions  of  the  states  of  the  weather. 

An  analogous  puzzle  arises  when  dealing  with  distributions  on  continuous 
<iuantities.  If  complete  Ignorance  about  a given  quantity  is  taken  to  be  a 
uniform  distribution  (over  a given  interval),  then  the  distribution  of  the 
logarithm  of  the  quantity  (about  which  at  least  as  much  ignorance  would  be 
expected)  is  by  no  means  uniform. 

A similar  difficulty  applies  to  the  min  loss  rule.  In  the  complete 
ignorance  case,  P is  a function  of  the  selected  partition  of  U. 

The  fact  that  various  nominal  rules  for  assigning  probabilities  are  not 
invariant  under  changes  in  the  partition  of  U Indicates  that  concepts  such 
as  uniform  distributions,  or  even  uniform  distributions  of  distributions,  are 
not  a complete  explication  of  the  notion  of  total  ignorance  or  total  uncer- 
tainty. Most  attempts  to  define  a logical  measure  for  solidity  are  based  on 
the  assumption  that  there  is  some  absolute  partition  — "atomic  events"  — 
which  cannot  be  further  subdivided.  If  there  were  such  an  irreducible  par- 
tition, then  complete  Ignorance  could  be  defined  by  a uniform  distribution  on 
that  partition.  Unfortunately,  there  does  not  appear  to  be  a meaningful 
criterion  for  specifying  such  atomic  events. 

5.  riixlmum  Hnt ropy  and  Minimum  Score 

A rule  which,  at  first  sight,  appears  to  be  similar  to  the  min  loss  rule, 
lias  been  receiving  increasing  attention  by  statisticians  and  Information 
•i.  rl.sts.  The  rule  goes  under  various  names;  perhaps  tlie  mt>st  popular  is  the 
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principle  of  maximum  entropy . The  rule  can  be  formulated  as:  given  an  event 
space  U,  and  certain  a-priori  information  I concerning  U,  the  most  reasonable 
extension  of  I to  a complete  probability  distribution  D on  U is  that  distribu- 
tion which  maximizes  the  entropy  of  D given  I.  The  entropy  of  a discrete 
distribution  on  a set  of  events  is  just  Pj  log  P^.  The  corresponding 

expression  for  a continuous  distribution,  D(x)  is  D(x)  log  D(x).  Entropy 
is  a basic  notion  in  the  theory  of  communication  as  developed  by  Claude 
Shannon.  As  we  saw  in  the  discussion  of  probabilistic  scores,  this  defini- 
tion of  entropy  is  equivalent  to  the  negative  of  the  expected  logarithmic 
score.  Hence,  the  principle  of  maximum  entropy  could  be  restated  as,  minimize 
the  expected  logarithmic  score,  given  I. 

The  maximum  entropy  rule  can  be  said  to  be  reasonable  in  two  ways: 

(a)  It  is  a hedge  against  bias.  If  we  define  a counterpredictor  as  one  who 
makes  an  estimate  with  a lower  expected  log  score  than  the  complete  ignorance.’ 
score,  the  maximum  entropy  rule  will  assure  that  the  resulting  estimate  is  not 
counterpredictive . (b)  A maximum  entropy  estimate  is  the  "weakest"  assumption 

possible  given  1 — i.e.,  it  adds  the  least  possible  information  to  1 of  any 
estimate.  The  second  statement  must  be  taken  with  some  caution.  The  maximum 
entropy  rule  is  just  as  sensitive  to  the  chosen  partition  as  any  other  rule. 

In  addition,  if  the  negative  entropy  is  interpreted  as  a probabilistic  score, 
then  other  score  rules  may  generate  different  estimates.  This  point  will  be 
elaborated  later  in  this  section. 

The  principle  of  maximum  entropy  is  often  recommended  in  tlie  context  ot 
generating  a priori  distributions  for  Bayesian  inlerence,  wlien  the  a priori 
probabilities  are  Incompletely  known.  Traditionally,  one  specialized  form 
of  this  recommendation  has  been  the  ascription  of  a uniform  distribution  when 
the  a priori  probabilities  are  unknown.  This  rule  has  been  controversial,  but 
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difficult  to  either  establish  or  kill.  Some  statisticians  like  A.J.  Fisher  have 
I ound  the  rule  outrageous,  and  liave  rejected  the  use  of  a priori  probabilities 
in  statistical  inference.  It  is  my  impression  that  of  late,  along  with  a 
general  spread  of  the  subjective  theory  of  probability,  there  has  been  an 
increasing  willingness  to  employ  both  the  notion  of  a priori  probabilities,  and 
the  assignment  of  uniform  distributions  when  the  a priori  probabilities  are 
unknown . 

One  frequently  employed  justification  of  both  subjective  a priori  proba- 
bilities, and  uniform  distributions  with  incomplete  information,  is  the  tacit 
assumption  that  use  of  these  "devices"  will  be  limited  to  the  case  of  inference 
where  some  objective  data  is  available,  e.g.,  from  an  experiment  or  an  observa- 
tion. The  function  of  the  rule  is  to  "get  the  inference  started."  It  is  an 
elementary  exercise  to  demonstrate  that  as  tlie  amount  of  objective  data 
increases,  the  influence  of  the  a priori  assumptions  rapidly  declines  — "the 

9 

a-posteriori  overwhelms  the  a-priorl"  • It  is  not  clear  whether  any  advocate 
of  uniform  priors  would  recommend  uniform  distributions  for  cases  where  no 
objective  data  is  available. 

In  the  case  of  complete  lack  of  information,  the  principle  of  maximum 
entropy  Implies  a uniform  distribution  for  a discrete  event  space.  Theorem  4, 
Chapter  II,  states  that  for  any  normal  proper  score  rule,  H(P)  is  a minimum 
for  a uniform  distribution;  and  since  the  logarithmic  score  rule  is  normal, 
the  theorem  applies  to  it  in  particular.  A uniform  distribution  cannot  be 
defined  for  a continuous  quantity  witli  an  infinite  range.  On  the  infinite 
real  line,  any  finite  uniform  function  has  an  infinite  integral.  Thus,  there 
l.s  no  way  to  Impose  tlie  maximum  entropy  rule  for  continuous  quantities  without 
assuming  some  minimum  amount  of  Information,  e.g.,  restricting  the  distribution 
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to  a finite  interval,  or  assuming  that  some  property  of  the  distribution  such 
as  the  mean  is  known.  For  example,  in  a two  alternative  event  space,  i!  it  is 
known  that  the  probability  of  a given  alternative  is  greater  than  or  equal  to 
a given  bound,  e.g.,  P(E)  > .7,  then  the  minimum  expected  score  distribution 
(for  normal  scores)  is  just  P(E)  = .7, 

As  another  example,  consider  the  case  of  a three-alternative  event  space 
U = (Ej^,E2,E2).  Assume  a random  variable  X defined  on  U,  such  that  X = 5 
if  E^  occurs,  X = 1 if  E^  occurs,  and  X = 0 if  E^  occurs.  Assume  Lli.it  the 
information  I is  that  the  average  of  X Is  1.5.  We  can  ask,  what  is  the  minimum 
score  distribution  on  U,  given  I?  Call  the  minimum  score  distribution  P*. 

There  are  three  equations  expressing  the  situation. 

1.  E P(E.)  = 1 

i 

2.  P(E.)X  = 1.5 
1 

3.  H(P*)  < H(P)  , for  any  P fulfilling  1 and  2. 

To  simplify  the  notation,  let  p = P(Ej^),  q = P(F^),  whence  PCK^)  = 1-p-q. 
If  we  use  the  quadratic  score  for  variety,  we  have 

H(P)  = p^  + q^  + (1-p-q)^ 

From  1 and  2, 

3p  + q + O(l-p-q)  = 1.5 

whence,  q = 1.5  - 3p.  Substituting  in  the  expression  for  H(P) 

H(P)  = 14p^  - lip  + 2.5 

Taking  the  derivative  with  respect  to  p,  and  setting  it  equal  to  0,  we  fiiul 
p = 11/28,  whence  q = 9/28.  The  second  derivative  Is  positive,  verifying  tli.nt 
the  point  is  a minimum.  Figure  33  illustrates  the  computation.  Equation  1 
limits  the  possibilities  to  the  triangular  simplex;  equation  2 turlher  limits 
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the  possibilities  to  the  line  labelled  1,  and  equation  1 selects  P*  on  this 


Maxinmni  entropy  distributions  have  been  computed  for  a variety  of  types 
of  distributions  and  typical  kinds  of  prior  information.  For  example,  the 
maximum  entropy  distribution  for  a quantity  with  an  Infinite  ran?>e  in  each 
direction,  with  known  mean  and  known  standard  deviation  is  just  the  familiar 
normal  (Gaussian)  distribution.^^  It  would  doubtless  be  instructive  to 
compute  minimum  score  distributions  for  comparable  cases  for  ocher  types  of 
proper  scores. 

Since  entropy  is  just  one  out  of  an  unlimited  set  of  score  rules,  it 

would  appear  to  be  reasonable  to  generalize  the  maximum  entropy  principle 

to  a minimum  expected  score  principle.  That  is:  Given  that  a particular 

score  rule  S(R,j)  has  been  selected  in  a given  problem,  and  given  prior 

information  I,  then  minimize  the  expected  score,  assuming  information  1. 

This  rule  is  particularly  interesting  when  applied  to  decisional  scores. 

If  we  have  a decision  matrix,  U..,  as  we  have  seen,  the  decisional  score  rule 

ij 

is  defined  by  the  prescription,  maximize  the  expression  ^ P.U.,  where  the 

j J H 

maximization  occurs  over  the  actions  For  the  moment  assuming  no  informa- 

tion about  the  contingencies  Fj , the  minimum  expected  score  ruli'  is  defined  by 

ml n max  R . U , . , , , 

R i j ' " 

Where  the  minimization  on  R is  over  the  entire  simplex  R j = 1* 

To  take  a simple  example,  consider  the  decision  nuitrix 

E E 


il  0 
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If  we  have  no  idea  whatsoever  concerning  the  probability  of  E,  (4  ) recommends 

positing  that  tlie  probability  of  E is  the  R that  minimizes  max 

i 

analysis  is  diagrammed  in  Fig.  lAa.  For  0 ^ R < 4/7,  B generates  the  higher 
expected  utility.  For  4/7  < R < i,  A takes  over.  The  rule  recommends  assuming 
that  R = 4/7,  tlie  probability  that  produces  the  minimum  of  the  maximum  expecta- 
tions. The  expected  return  on  this  assumption  is  3 x 4/7  + 1 x 3/7  = 15/7  = 

0 X 4/7  + 5 X 3/7. 

Fig.  34a  Illustrates  a general  point,  namely,  that  for  score  rules  which 
are  not  normal,  the  min  score  principle  does  not  generate  a uniform  distribution, 
con  for  the  case  of  no  information. 

The  min  score  rule  allows  us  to  invoke  some  results  of  zero-sum  game 
theory.  If  we  think  of  R as  the  analogue  of  the  mixed  strategy  of  an  opponent, 
then  the  min  score  rule  is  the  analogue  of  the  min  max  solution  of  two-person, 
zero-sum  games.  If  we  Introduce  the  possibility  of  mixed  actions  on  the  part 

01  the  decision  maker,  then  the  basic  theorem  of  two  person  zero-sum  games  can 

be  used  to  establish  the  result  that  a mixed  action  for  the  decision  maker  exists 


X)r.u  . 

i J iJ 


which  guarantees  — at  least  in  terms  of  expectation  — the  min  score  expected 
util Ity . 


Let  designate  a mixed  action  for  the  decision  maker;  that  IsJ^S^  = 1, 

i 

and  is  the  probability  with  which  action  i is  selected.  Then 
min  max>  R.U,.  = max  mlnT'  S.U.. 

R i 1 J S j T ^ (5) 

(5)  is  a direct  consequence  of  the  fundamental  theorem  of  game  theory. 

The  .cault  is  Illustrated  in  Fig.  34b.  The  abscissa  is  now  S,  the  probability 
with  which  action  A is  selected.  For  0 < .S  < 5/7,  the  minimum  expectation  is 
from  E.  The  maximum  of  the  minimum  expectations  occurs  at  S = 5/7.  If  the 
decision  maker  chooses  A with  frequency  5/7  ami  B with  frequency  2/7,  then 
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Ills  expucleil  payol'f  l^-  L5/7,  wliatever  ttia  prohabiilty  of  K.  As  wt-  saw  above, 

15/7  is  also  the  expected  return  on  the  min  score  assumption  R = 4/7. 

i’hroup.h  the  route  of  the  min  expected  score  rule,  we  have  arrived  at 
a suggestion  which  is  fairly  old  as  such  things  go  in  the  theory  of  decisions 
under  uncertainty,  namely  the  min  max  rule.  Although  the  suggestion  appeared 
shortly  after  the  publication  of  von  Neumann  and  Morgenstern' s basic  work  on 
game  theory,  it  has  not  received  general  acceptance  as  a "solution"  to  the 
problem  of  decisions  with  no  information.  Ironically,  part  of  the  reason  is 
that  it  works  so  well  for  zero-sum  game  theory.  The  objection  is  that  the 
rule  does  the  equivalent  of  casting  nature  in  the  role  of  an  inimical  opponent, 
i.e.,  something  which  is  "striving"  to  minimize  the  decision  maker's  rewards. 
Since  all  of  physics  implies  that  nature  is  neutral,  the  min  max  rule  appears 
more  pessimistic  than  necessary. 

It  is  not  clear  that  biology  carries  the  same  message.  If  the  competi- 
tive interpretation  of  natural  selection  is  accepted,  there  is  some  reason 
for  assuming  that  the  living  part  of  nature  is  hostile.  For  example,  plants 
secrete  poisons  or  grow  thorns  to  discourage  tiie  tendency  of  animals  to  eat 
them.  But  tliat  would  not  be  relevant  to  the  estimation,  e.g.,  of  the  probability 
of  an  earthquake. 

Another  potential  objection  which  is  a good  deal  stronger,  is  that  if  there 
are  saddlepoints  in  the  matrix,  then  the  rule  ignores  potential  outcomes  which 
are  highly  favorable,  and  not  ruled  out  by  the  information  (or  lack  thereof)  of 
the  dec Islonnuiker . For  example,  if  the  matrix  is 

E E 

A 1 2 

B 0 X 
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then  the  rule  says  select  action  A,  no  matter  what  x is . If  x is  10  , with 
the  pay  off  in  dollars,  a decisionmaker  would  probably  be  tempted  to  try  B. 

The  rain  score  rule  addresses  the  same  issue  as  the  min  loss  rule  derived 
in  Section  2.  They  generate  different  estimates,  even  with  informational 
score  rules.  Consider  the  two  elementary  examples  discussed  earlier.  For  the 
two  alternative  cases  where  P(E)  is  between  .7  and  1,  the  min  score  rule  gives 
R*  = .7,  whereas  the  min  loss  rule  gives  R*  = .85,  the  average  of  .7  and  1. 

In  the  three  alternative  example,  with  the  average  of  the  random  variable  - 1.5, 
the  min  score  rule  gives  R*  = (11/28,9/28,8/28)  whereas  tlie  rain  loss  rule  gives 
R*  = ( . 375 , . 375 , . 25) . In  the  case  of  a simple  decision  matrix  with  no  con- 
straints on  the  probabilities,  the  min  loss  rule  gives  the  uninspired  recommend- 
ation R*  = P,  the  uniform  distribution.  Thus,  for  the  matrix  of  Fig.  34  the 
min  loss  rule  selects  action  B,  rather  than  the  mixed  action  selected  by  tlie 
rain  score  rule. 

The  min  score  rule  has  tlie  peculiarity  that  it  recommends  minimizing 
whatever  reward  function  is  operative  in  the  decision.  That  feature  lias  a 
non- intuitive  flavor.  Normally,  in  decision  theory,  rules  are  formulati'd  so 
as  to  maximize  some  reward  (or  to  minimize  a loss) . The  rain  lo  ^ ; rule  is  more 
In  the  "mainstream."  As  we  have  seen,  it  is  equivalent  to  maximizing  the 
average  expected  score.  If  the  min  score  rule  is  applied  to  the  formulation 
H of  a prior  distribution  as  the  first  step  in  a Bayesian  inference,  there  is 

perhaps  some  Justification  for  the  minimization,  in  that  it  will  "contaminate" 
the  remainder  of  the  Inference  as  little  as  possible.  However,  it  is  not  clear 
how  one  might  justify  l lie  minimization  If  the  resulting  estimate  is  lo  be  usi 
directly  in  a decision. 

Since  the  min  loss  rule  Is  1 luletieiulen t of  tlu*  score  rule  lunployed,  and 
because  It  involves  maximizing  the  score,  it  would  appear  to  be  the  jireterri'd 
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dll'  for  out  inuit  InK  "imluiowii"  d 1st  r Ibiit  ions  for  di-c  i s ions  . Tho  following; 


section  examines  some  experimental  data  relevant  to  tliis  conclusion. 
h.  Uncertainty  and  Choic_e  Behavior 

Perhaps  the  most  persuasive  evidence  that  there  is  a distinction  between 

uncertainty  and  probability  is  a set  of  experiments  which  appear  to  show  that 

individual  choice  behavior  under  uncertainty  is  incompatible  with  the  postulates 

of  subjective  probability.  Some  of  these  have  been  triggered  by  the  arguments 

of  F.ilsberg  concerning  the  appropriateness  of  the  sure-thing  postulate  for 

12 

choices  with  Incomplete  information.  Others  have  derived  from  the  w'ork  of 
A1  lals. 

To  take  up  the  Allais  paradox  first.  It  is  my  impression  that  this  type 
of  puzzle  can  be  resolved  within  the  ambit  of  personal  probability  theory 
without  invoking  any  new  notions.  The  Allais  puzzle  goes  like  this.  Suppose 
1 ask,  which  would  you  rather  have,  one  million  dollars  for  sure,  or  five 
million  dollars  with  a probability  of  .8?  Most  individuals  who  have  been 
asked  this  question  have  little  difficulty  deciding  they  would  prefer  the 
million  for  sure.  (1  don't  think  any  billionaires  have  been  among  the 
respondents.)  Now  suppose  1 ask,  which  would  you  rather  have,  one  million 
dollars  with  a probability  of  .1  or  5 million  dollars  with  a probability  of 
.08?  Most  individuals  would  prefer  the  5 million  dollars  with  probability 
.08.  On  the  face  of  it,  this  violates  the  standard  prescription  that  individ- 
uals should  maximize  their  expected  utility.  There  is  no  pair  of  utilities 
for  one  million  and  five  million  dollars  which  will  rationalize  these  choices. 

To  see  tills,  let  the  utility  of  one  million  dollars  he  U(l)  and  the  utility 
of  five  million  dollars  be  U(5).  We  want  U(l)  > .8U(5)  and  .1U(1)  < .08U(5). 
Multiplying  both  sides  of  the  second  inequality  by  10  we  get  U(l)  < .8l'(5). 
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However,  the  puzzle  as  usually  presented  overlooks  a well  known  judp.mental 
phenomenon,  namely  tlie  influence  of  context.  If  someone  offers  me  a choice 
between  a million  dollars  for  sure,  and  some  probabilistic  outcome,  if  i believe 
the  individual,  then  in  a quite  reasonable  sense,  I have  a million  dollars. 
Whatever  the  status  of  my  fortune  before  the  offer,  it  has  changed  drastically. 
The  actual  choice  is  now  between  0 (i.e.,  minus  one  million  dollars)  with  the 
probability  .2,  and  5 million  (i.e.,  plus  four  million  dollars)  with  the 
probability  .8.  In  short,  the  offer  resets  the  zero  of  the  decision  situal  ion 
to  one  million  dollars.  There  is  no  puzzle  in  assuming  that  the  loss  of  a 
million  dollars  would  more  than  compensate  for  the  .8  chance  of  getting,  four 

■k 

more  million  dollars. 

Suppose  we  assume  a simple  exponential  utility  function  lJ(x)  = 1-e 
where  x is  measured  say  in  units  of  1/4  million  dollars.  Tlie  extra  four 
million  is  a gain  of  16  units;  the  utility  is  essentially  1.  The  loss  of  a 
million  is  the  loss  of  four  units,  and  the  utility  is  -.54.  In  this  case, 
the  Individual  would  reject  any  option  in  which  the  probability  of  the  five 
million  is  less  than  .98.  Interestingly,  with  this  \itility  function,  tlie 
outcome  is  highly  sensitive  to  the  units.  If  the  individual  was  thinking  in 
terms  of  units  of  say  $1000,  the  utility  of  a loss  of  a million  dollars  would 
be,  ior  all  practical  purposes,  negatively  infinite. 

The  principle  invoked  in  tlie  preceding  could  be  called  tlie  context 
dependent  zero;  that  Is,  the  zero  point  of  the  utility  scale  is  deiiendent 
on  the  situation,  including  the  options  presently  available. 

* 14 

The  pioneering  paper  of  Friedman  and  .Savage  on  nonlinear  utility  ol  mouev 

as  an  explanation  of  ’various  kinds  of  beliavlor  which  appear  anomolous  if  the 

value  of  money  is  considered  linear  would  then  explain  this,  as  well  as  other 

puzzles . 

182 


The  same  transf ormtit i on  of  zero  Ls  not  appropriate  for  the  second  choice 


I - 


situation,  since  now  there  is  no  assurance  that  ti.e  million  is  available. 

The  context  dependent  zero  analysis  does  not  appear  to  be  adequate  for 

the  Ellsberg  case.  The  Ellsberg  type  paradox  involves  a choice  between  rewards 

with  known,  and  with  unknown  probabilities.  A typical  situation  (taken  from 
15  , 

McKrimmon  ) is  that  of  an  urn  drawing,  with  three  kinds  of  balls  — say  red, 
blue,  and  green  — in  which  the  proportion  of  red  balls  is  specified,  say  1/3, 
but  the  relative  proportion  of  blue  and  green  is  not  specified.  Two  different 
choices  are  proposed,  e.g.  those  in  Table  1. 


Table  1 

1/3 

2/3 

Red 

Blue 

Green 

A 

$1000 

0 

0 

B 

0 

$1000 

0 

A' 

$1000 

0 

$1000 

B’ 

0 

$1000 

$1000 

Asked  to  choose 

between  A 

and  B,  most 

subjects  will  choose  A.  Asked  to 

choose  between  A'  and 

B' , most 

subjects  will  choose  B' 

' . As  in  the  untrans- 

formed  Allais  example  there  i.s  no  choice  of  subjective  probabilities  for  Blue 
and  Green,  and  utilities  for  0 and  $1000  that  will  account  for  these  choices. 
Roughly  speaking,  the  subjects  appear  to  prefer  choices  in  which  the  rewards 
are  more  "surely  known"  even  if  probab i 1 is  tic . The  issue  can  be  furtlier 
fdiarpened  by  noting  that  tiie  rewards  If  Green  is  drawn  are  identical  for  A and 
B and  identical  for  A'  and  B' . Thus,  wliatever  could  make  A preferable  to  B 
should  also  make  A'  prcferrable  to  B' . In  short,  the  observed  choices  violate 
the  sure-thing  principle,  P6,  Chapter  II. 
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Situations  with  choice  tasks  like  those  in  Table  1 have  been  tried  in 
several  experimental  series. The  results  from  these  experiments  are  all 
similar,  the  majority  of  subjects  choose  A and  B' . 

Although  it  is  possible  to  find  a kind  of  resulution  for  the  Kllsberg 
paradox  using  the  context  dependent  zero  notion  — in  this  case  assuming  that 
the  more  certain  of  the  options  generates  a new  "expected  zero"  — the  result 
appears  labored. 

If  the  min  score  rule  is  applied  to  Table  1,  since  the  probability  of 
Red  is  fixed,  the  minimization  is  carried  out  for  0 P(B)  < 2/3,  wliere  P(B) 
is  the  probability  of  blue.  We  can  diagram  the  two  choice  situations  as  in 
Figs.  35a  and  b,  where  the  abscissa  is  the  (unknown)  probability  P(B)  of  blue. 
In  Fig.  35a,  if  the  Individual  selects  A,  tlien  his  expectation  is  a ci)iistant 
1/3  X $1000.  If  he  selects  B,  then  his  expected  ovitcome  could  lie  anywhere 
along  the  line  labeled  B.  If  P(B)  < 1/3,  then  his  expectation  is  maximized 
if  he  chooses  B.  For  P(B)  > 1/3,  his  expectation  is  maximized  if  he  chooses 
A.  The  minimum  of  the  maximum  outcomes  occurs  where  the  two  lines  cross  at 
P(B)  = 1/3. 

However,  choosing  A assures  obtaining  the  min  max,  since  Lli-  expectation 
of  A is  a constant.  Put  in  other  terms,  the  pair  (A,G)  is  a satidlepoinl  of 
the  matrix 

B a 

A 1/3  $1000  1/3  $1000 

B $1000  0 

which  is  obtained  by  omitting  the  constant  probability  column  Red  !' 

For  the  second  option.  Fig.  35b,  it  is  B'  which  assures  t hi 
come.  The  basic  assymmetry  between  the  two  cases  can  tliu-  h, 
the  statement  that  in  both  cases  there  is  a pun-  (uumr  ■ 


1H4 


Incomplete 


guarantees  the  min  max  outcome,  but  in  the  first  choice  the  pure  action  A is 


associated  with  the  fixed  probability  for  Red,  whereas  in  the  second  choice, 
the  pure  action  is  associated  with  the  fixed  probability  for  Blue  or  Green. 

To  this  extent,  the  min  score  (min  max  expectation)  rule  is  in  accord 
with  the  subjects'  choices.  However,  the  fact  that  the  min  score  rule  fits 
the  experimentally  observed  choices  for  these  two  examples  is  by  no  means  a 
demonstration  that  It  would  be  followed  in  other  decision  problems  with 
incomplete  information.  McKriramon  has  run  a number  of  experiments  with  the 
same  basic  decision  problem  as  displayed  in  Table  1,  varying  the  probability 
of  Red.  In  his  experiments,  P(R)  varied  from  .2  to  .5.  TTie  extent  to  which 
his  groups  (which  numbered  19  subjects)  exhibited  "Ellsberg  type  decisions" 
was  a maximum  at  P(R)  = 1/3,  and  declined  on  either  side.  At  P(R)  = .5,  A 
dominates  B and  A’  dominates  B',  as  displayed  in  Figs.  36  a,h.  Thus,  we 
would  expect  that  at  P(R)  = .5,  all  subjects  would  select  A and  A',  winch  is 
what  McKximmon  found. 

The  explanation  for  the  other  cases  is  not  so  obvious.  A and  B'  are  the 
min  max  solutions  for  all  of  the  cases  except  P(R)  = .5.  If  the  min  loss  rule 
is  applied  to  Table  1,  B and  B'  are  the  preferred  choices  for  P(R>  > 1/3, 
and  A and  A'  are  the  preferred  choices  for  P(R)  < 1/3.  At  P(R)  - 1/3,  the  min 
loss  rule  is  indifferent  between  A and  B and  between  A'  and  B' . 

For  other  values  of  P(R),  we  cem  define  what  might  be  called  the  "relative 
advantage"  of  one  action  over  another  as  the  proportion  of  the  undetermined 
Interval  l-P(R)  in  which  the  first  action  has  a higher  expected  value  than  the 
second.  In  Fig.  35a,  the  relative  advantage  of  A over  B is  1/2.  In  Figs.  37a 
and  b,  the  choices  are  dlagrammi^d  for  the  case  P(R)  “ .2.  The  relative  advan- 
tage of  A over  B is  the  ratio  of  the  solid  part  of  the  line  labelled  A to  the 
total  line,  in  this  case  1/4.  The  relative  advantage  of  B'  over  A'  is  3/4. 
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In  Kig.  J8  tlic  Mc-Krlmmon  <i;ita  Is  plotted  wltli  the  ordinate  showing  the 
proportion  of  times  a given  action  was  selected,  and  the  abscissa  showing  the 
relative  advantage  of  that  action.  The  two  sets  of  points  are  for  A over  B 
and  B'  over  A'.  A is  the  rain  max  action  in  the  first  ciiolce,  and  B'  is  the 
min  max  action  In  the  second  choice,  except  for  the  case  P(R)  = .5.  The  point 
at  the  origin  is  the  result  for  P(R)  = .5. 

The  proportion  selecting  a given  action  is  nicely  raonotonic  in  the 
relative  advantage,  and  the  behavior  of  the  two  curves  is  surprisingly  similar 
(note  the  two  identical  points),  despite  the  fact  that  the  two  are  in  reverse 
order  with  respect  to  the  value  of  P(R)  involved.  For  example,  the  two 
identical  points  at  (.33,. 42)  are  for  A over  B at  P(R)  = .25  and  for  B*  over 
A'  at  P(R)  = .4. 

Tlie  effect  of  the  min-max  property  of  A and  B'  shows  up  in  the  fact  that 
the  proportion  selecting  these  two  mounts  close  to  1 when  the  relative  advan- 
tage is  only  .5.  In  the  linear  region  between  .2  and  .5  on  relative  advantage, 
the  data  fit  the  hypothesis  that  the  subjects  are  choosing  between  the  rain 
score  and  the  min  loss  "solutions"  in  a ratio  roughly  about  2.7  times  the 
relative  advantage. 

Without  additional  experimentation  designed  specifically  to  test  the 
hypothesis  inherent  in  Fig.  38,  it  would  be  hasty  to  call  it  more  than  sugges- 
tive. What  does  appear  to  be  firm,  from  the  published  experimental  data,  is 
that  the  upper  level  distribution  model,  l.e.  the  min  loss  rule,  does  not 
completely  fit  observed  choices  under  incomplete  information,  and  neither  does 
the  rain  score  rule. 

7.  Nominal  Estimates  with  Factor  Models 

One  of  the  drawbacks  to  general  prescriptions  like  the  min  loss  rule  is 
the  fact  that  they  have  a mainly  negative  Import.  The  chief  advantage  appears 
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Figure  38.  Proportion  Choosing  Given  Action  as  Function 
of  Relative  Advantage 
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to  be  guarding  against  bias.  Put  another  way,  although  such  rules  are  "safe," 

they  also  appear  to  be  "weak;"  the  naive  expectation  is  that  they  would  gener- 

ate low  returns  if  applied  in  practice.  Part  of  the  misapprehension  here  is 
an  illusion  concerning  the  excellence  of  everyday  decisions.  Although  there 
has  not  been  a general  survey  of  the  quality  of  decisions  in  industry,  govern- 
ment and  private  affairs,  the  evidence  is  mounting  that  many  decisions  in 

everyday  life  would  be  dramatically  improved  if  "weak"  nominal  estimates  were 
substituted  for  the  guesses  and  hunches  which  presently  guide  the  decisions. 

Some  of  the  positive  advantages  of  nominal  estimation  rules  can  be  seen 
more  clearly  in  the  context  of  factor  models.  In  the  factor  model  approach 
to  estimation,  if  we  restrict  attention  to  the  elementary  case  in  which  the 
values  of  the  factors  are  given,  and  the  individual  makes  his  estimate  knowing 
these  values,  then  the  only  task  is  to  assign  subjective  weights  to  the  factors 
and  "perform  the  arithmetic."  In  practice,  it  seems  unlikely  that  estimates 
are  arrived  at  in  tills  formal  way.  However,  for  many  estimation  tasks,  it 
appears  to  be  a reasonable  approximation. 

The  figure  of  merit  most  often  used  for  factor  models  is  correlation.  In 

the  elementary  case  we  are  now  considering,  there  are  a set  of  objects 

I X = {x,y,z....}  where  each  object  is  a vector  (Xj^,...x^)  and  there  is  some 

function  t(x)  which  defines  the  quantity  to  be  estimated.  The  x^  are  the 

factors.  An  individual  estimates  t by  determining  a set  of  weights  a^  for 

the  factors,  and  asserts  r(x)  • compute  the  correlation 

1 ^ 

between  r and  t as 


p(r.t) 


]^a^x^)(t  - t)/s^s^ 


(6) 


A 

Normally,  it  would  be  necessary  to  add  a constant,  but  since  correlation  is 
invariant  under  a linear  transformation  on  the  quantities  being  correlated, 
the  constant  will  be  omitted  for  simplicity. 


191 


In  this  case,  the  bar  over  an  expression  indicates  the  mean  of  that  expression, 
m is  the  total  number  of  cases.  Simplifying  (6) , we  obtain 

p(r,t)  = 1/s^  (x^,t)  (7) 

If  we  assume  the  variables  have  been  normalized,  so  that  s^  = 1,  and  furthermore 
assume  that  the  x^  are  uncorrelated,  i.e.,  p(x^,Xj)  = 0 for  all  1 and  j,  then 
(7)  reduces  to 


a . 


)(r,t)  = 


p(x^,t) 


For  the  case  of  uniform  weights,  a^  = 1 for  all  i,  (8)  becomes 


p(r,t)  53  P(x,  ,t) 

V n 1 


(8) 


(9) 


Thus,  for  this  elementary  case,  an  estimate  based  on  uniform  weights  is 
better  than  the  average  correlation  between  the  variables  and  the  true  answer 
by  a factor  of  \fn.  If  all  the  individual  correlations  are  positive,  then  (9) 
indicates  that  the  uniform  weight  estimate  will  improve  with  each  additional 
variable.  As  a rough  illustration,  suppose  each  p(x^,t)  is  about  .2,  and 
n » 5,  then  p(r,t)  = .45. 

The  topic  can  be  explored  a little  further  if  we  assume  that 

t(x)  = 2Zb.x.  — i.e.,  the  function  to  be  estimated  is  also  linear.  We  now 
i 

have  (under  the  assumption  that  the  variables  are  normalized  and  uncorrelatcd) . 


P(r,t) 


£a,b 


t"l 


(10) 


Since  the  b's  are  unknown,  a reasonable  requirement  on  the  a's  is  that 
they  maximize  the  expectation  of  p(r,t)  over  the  possible  values  of  tlie  b's. 


Thus  we  would  like  to  find  the  a which  generates 


r 1 ^ ^ 

J 7^'  ' ^(t>)  (11) 

where  B is  the  set  of  possible  b's  and  D(b)  is  a probability  distribution  on 


B.  Since  tiie  correlation  is  invariant  under  linear  transformations,  there  is 

no  loss  of  generality  in  assuming  that  the  b's  have  been  normalized  so  that 

^ b.  = 1.  In  the  extreme  case  that  B includes  all  possible  sets  of  coeffi- 
i ^ 

cients,  i.e.,  B is  the  simplex  D(b)  is  taken  to  be  uniform,  we 


have 


max 

a 


is  symmetric  in  i,  as  is  the  simplex 


Tlie  expression  b 
thus,  the  Integrals  will  be  the  same  for  all  i.  Hence 

b 
B 


/ 


= k 1/n 


Whence,  (12)  becomes 


(12) 


1; 


(13) 


max  k 1/n 


Assuming  the  a's  are  also  normalized,  a^  //E  a^  is  lust  the  spherical 


(14) 


scoring  rule,  so  (14)  is  formally • equivalent  to  (6)  Chapter  III  with 
Pj  = 1/n.  Hence,  the  maximum  over  a occurs  when  a^  = 1/n  for  all  i. 

Thus,  for  the  case  of  "complete  Ignorance"  (interpreted  here  as  a uniform 
distribution  over  the  set  of  all  possible  b's)  the  maximum  expected  correla- 
tion is  obtained  when  the  estimated  weights  are  uniform. 
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Having  determined  that  the  best,  estimate  given  "complete  Ignorance”  is  the 
set  of  uniform  weights,  there  remains  the  question  of  Just  how  good  (or  bad)  it 
is.  One  way  to  measure  this  is  to  compute  the  expected  correlation  over  the 
set  of  possible  coefficients.  This  requires  evaluating  the  integral 


P(r,t) 


(15) 


For  the  case  of  two  variables,  the  Integral  is  quite  straightforward.  It  is 


1//2 


cbc 

2 2 
X + (1-x) 


which  yields 

p(r,t)  = h log  I -j  = .881 


For  three  or  more  variables,  the  integral  becomes  somewhat  more  complex, 
taking  the  form 


P(r,t) 


n-1 

y/- 

l/^  / / ...  I 1 

•'o  •'n  •{) 


i dx^dx^ 


,dx 


n-1 


(Ih) 


A mixed  analytic  and  numerical  solution  to  (16)  for  three  variables  yields 
p(r,t)  = .834. 

For  three  or  more  variables,  there  is  a certain  embarrassment  in  assuming 
the  "complete  Ignorance"  case,  i.e.,  assuming  B = total  simplex.  The  assump- 
tion includes  among  the  possible  cases  those  in  which  all  but  one  of  the  b's 
are  zero.  I'm  Inclined  to  think  that  if  the  problem  is  so  poorly  defined 
that  it  is  necessary  to  take  into  account  the  possibility  that  t(x)  is  a func- 
tion of  only  one  of  the  variables,  the  model  is  not  ready  to  be  used  in  a 
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serious  decision.  A certain  anount  of  arbitrariness  enters  in  attempting  to 
tormulate  a suitable  restriction  "a  priori."  A convenient  restriction  is  to 
the  case  that  no  more  than  one  of  tlie  b's  is  allowed  to  be  zero.  This  restric- 
tion can  be  expressed  by  setting  R equal  to  the  inscribed  hypersphere  in  the 
simplex.  For  tlie  case  of  three  variables,  this  restriction  is  illustrated  in 
Fig.  39.  The  total  set  of  possible  b's  consists  of  the  triangular  region 
b.  = 1.  The  inscribed  circle  cuts  off  the  extreme  cases.  For  this  B , 

1 

the  problem  has  spherical  symmetry,  and  we  have 


1 /*\/n(n+TT 

p(r,t)  = — [ S(r)dr 

/tT  o 


(17) 


V(S)/r  + 1/n 


where  V(S)  is  the  volume  of  the  Inscribed  hypersphere  with  radius 


S(r)  is  the  surface  of  the  hypersphere  with  radius  r,  and  n is  the 
number  of  variables. 

For  n = 3,  (17)  gives  p(r,t)  = .90,  and  for  n = <4,  p(r,t)  = .915.  These 
two  additional  cases  were  as  far  as  my  patience  in  evaluating  Integrals 
lasted.  The  fact  that  the  average  p increases  with  n reflects  in  part  the 
fact  that  the  ratio  of  the  volume  of  the  inscribed  hypersphere  to  the  volume 
of  the  total  simplex  goes  down  with  increasing  n — more  extreme  cases  are 
eliminated. 

Some  additional  insight  into  the  effectiveness  of  the  equal  weights 

approximation  can  be  obtained  by  noting  that  b^  is  a measure  of  the  dlsper- 

i 

Sion  of  the  b's.  Setting  a^  = 1 in  (10),  we  obtain 


195 


which  is  the  correlation  of  tlic  estimate  with  uniform  weights  with  tlie  true 
function  witli  unknown  weights. 

For  example,  if  t = x + 2y,  p(r,t)  for  uniform  weigtits  is  .95.  if 
t = X - 2y  + 3z,  p(r,t)  = .93. 

Rearranging  terms  in  (18)  we  can  write 


wliere  s^^  is  the  variance  of  the  true  weights,  and  b tlie  average. 

Thus,  for  uncorrelated  variables,  the  correlation  between  an  equal-weight 

approximation  r(x)  and  any  linear  function  t(x)  is  determined  by  the  relative 

variation  of  the  coefficients  of  t,  where  tlie  relative  variation  is  defined 

as  s,  /b,  if  the  distribution  of  the  b's  is  "well  behaved"  then  the  correlation 
1) 

will  be  high.  The  worst  case,  of  course,  is  the  degenerate  one  where  all  but 
one  of  the  coefficients  are  zero.  In  this  case,  p(r,t)  = l/\/n.  However, 
this  worst  case  liardly  appears  to  be  of  practical  interest.  If  there  is  a 
non-trivial  likelihood  that  t is  a function  of  only  one  of  the  variables,  then, 
as  remarked  earlier,  Introducing  the  approximation  as  a serious  basis  for  a 
decision  is  at  best  "premature". 

One  useful  reference  case  is  that  of  a uniform  distribution  of  weights  on 
some  interval.  Whether  a uniform  distribution  can  be  taken  as  a characteriza- 
tion of  "ignorance"  in  the  case  of  coefficients  for  linear  models  is  not  as 
obvious  as  it  seems  in  the  case  of  estimating  a single  quantity.  For  the 
coefficients,  we  are  estimating  a set  of  quantities.  However,  it  is  clear 
that  a uniform  dlstribvition  is  one  form  of  "low  information"  assumption  about 
tlie  coefficients.  For  any  uniform  distribution  on  a positive  interval  (u,v) 

I)  » l/2(v+u)  and  sj^  » l/12(v-u)^.  Thus 
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(20) 


9 


1/3 


2 -2 

For  S’^/l)  = 1/3,  from  (19),  p(r,t)  = .866.  (20)  is  iadependent  of  the  size 

of  the  interval  (ii,v),  and  roughly  independent  of  the  number  of  coefficients — 
there  have  to  be  enough  so  that  a uniform  distribution  can  be  approximated. 

For  example,  if  the  b's  consist  of  a string  of  successive  integers  l,'',...,n, 
then  (20)  holds  approximately  for  any  n '2. 

For  any  distribution  of  coefficients  more  favorable  than  a uniform 
distribution — e.g.,  if  tlie  coefficients  tend  to  cluster  about  some  intermedi  att> 
value  with  only  a few  extreme  values — p(r,t)  will  be  greater  than  .866. 

Although  the  assumption  of  uniform  weights  is  weak  in  the  sense  that  it 
can  be  derived  from  the  "complete  ignorance"  assumption  of  a uniform 
distribution  on  all  possible  sets  of  weights,  the  numerical  examples  show  that 
It  is  not  necessarily  a poor  assumption. 

18 

This  conclusion  is  urged  on  empirical  grounds  by  Robin  Dawes.  He 
has  compared  the  equal  welglit  approximation  to  Intuitive  judgments  of  traino<l 
personnel  in  the  estimation  of  grade  point  averages,  and  determin.it  ion  ol 
degree  of  mental  illness  from  personality  tests.  Over  a numlier  >1  extensive 
studies  of  these  tasks,  the  equal  welglit  approximation  gave  'gnif leant Iv 
higher  correlations  tlian  the  original  estiimites. 

8.  Theory  of  Information-Control 

This  section  is  somewhat  of  an  aside  with  respect  to  tlie  main  theme  of 
this  chapter.  The  primary  reason  for  including  it  is  to  Indicate  in  a more 
fundamental  way  the  interrelationship  between  the  notion  of  certainty 
(or  solidity)  and  the  role  of  decision  rules.  In  addition,  the  formal  decision 
theory  outlined  below  is  a good  deal  more  general  than  the  theory  embodied  in 
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the  probabilistic  theory  of  estimation  of  Sec.  5,  Chapter  11.  It  offers  a 
framework  in  which  a wide  variety  of  types  of  information  can  be  incorporated 
in  decision  rules.  However,  most  of  the  potential  itics  remain  to  be  explored. 

Any  decision  situation  involves  aspects  which  are  clearly  linder  the 
control  of  tlie  decision  maker,  and  other  aspects  whicli  are  clearly  not  under 
his  control  such  as  probabilistic  events  or  actions  which  are  under  the 
(.ontrol  of  other  decision  makers.  In  between  are  aspects  which  are  not  clearly 
one  or  the  other.  Bayesian  decision  theory  assumes  that  there  are  only  two 
classes,  actions  (controlled)  and  events  (uncontrolled).  In  the  following 
these  distinctions  will  be  blurred.  It  will  be  assumed  that  there  is  a set  of 
aspects  which  are  clearly  under  the  control  of  the  decision  maker  (which 
could  be  called  capabilities)  and  the  remaining  aspects  whose  status  is  not 
well-defined  initially,  which  might  be  called  contingencies . 

Control  Involves  two  properties,  (a)  the  decision  maker  can  implement 
a given  option,  and  (b)  he  can  select  any  alternative  out  of  a set  of  options. 
The  second,  which  might  be  called  tlie  "free-will"  assumption,  is  the  crucial 
one  for  the  following  theory.  A decision  model  is,  of  course,  not  reality 
but  a representation  of  reality  as  viewed  by  the  decision  maker.  He  can  be 
mistaken  about  his  capabilities.  However,  the  ability  to  select  one  of  the 
listed  options  is  basic  — otherwise  the  whole  exercise  is  a dream. 

This  assumption  has  been  brought  under  fire  for  the  case  of  higi; 
uncertainty.  With  Insufficient  information,  it  has  been  contended,  an  indi- 
vidual can  find  himself  in  a state  which  is  at  least  as  bad  as  that  of  Buridan’s 
ass  — he  simply  can't  make  up  his  mind.  Perhaps  a better  expression  might  be 
he  can't  make  up  his  feelings.  I'm  not  aware  of  a clear  demonstration  of  this 
potential  phenomenon  in  laboratory  studies;  but  vacillation  is  a familiar 
concept  in  literary  psychology,  and  "decisiveness"  is  a well-known  trait 
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ascribed  to  successful  managers.  What  has  not  been  documented  is  the  relation- 
ship between  these  traits  and  the  information-control  characteristics  of  the 
decision  situation.  Wliat  I propose  to  do  is  examine  the  consequences  of 
assuming  that  an  individual  can  always  make  a choice  along  with  some  of  the 
more  common  assumptions  concerning  rationality  of  choice. 

One  additional  piece  of  conceptual  apparatus  is  introduced,  namely  mixed 
actions.  For  Bayesian  decision  theory,  there  is  no  gain  involved  in 
Introducing  mixed  actions  since,  under  Bayesian  assumptions,  the  optimal 
action  is  always  a pure  strategy.  In  the  following,  we  allow  the  possibility 
that  if  an  individual  cannot  choose  between  two  options,  one  reason  might  be 
that  he  would  prefer  a mixture  of  these  two  to  either  separatelv. 

The  formal  model  is  simplified  in  several  ways.  Rather  than  starting 
with  capabilities  and  composing  these  into  potential  actions,  we  assume  that 
step  taken  and  start  with  potential  actions  (or  strategies).  Similarly,  the 
process  of  composing  contingencies  into  joint  occurrences  will  be  bypassed  and 
the  model  starts  with  a set  of  exclusive,  possibly  non-control led  states, 
which,  for  want  of  imagination  on  my  part  will  also  be  called  contingencies. 

The  set  of  contingencies  is  assumed  to  be  finite.  An  action  x,  will  be 
represented  by  a vector,  x * (Xj^ , . . . ,x^) , where  each  component  specifies  the 
utility  to  the  decision  maker  of  the  outcome  if  x is  taken  lud  contingency  i 
obtains.  Thus,  any  two  actions  which  engender  the  same  vector  of  utilities 
are  considered  identical . 

We  thus  have  a set  X = {x,y,z,...}  of  potential  actions.  This  set  is 
represented  by  an  n-dimens ional  space,  where  n is  the  number  of  contingencies 
It  is  assumed  that  X is  a metric  space,  l.e.,  a distance  function  is  defined 
which  fulfills  D1-D3  of  Section  1,  Chapter  111.  In  addition,  it  is  assumed 
that  X is  closed  under  mixtures.  If  x and  y are  any  two  points,  then 


ax  + (l-a)y,  0 ^ a •'  1,  is  also  a point  of  X.  Actually  there*  Is  no  great  loss 
of  gcnoralitv  if  it  is  assumed  that  X is  ordinary  Kuclldean  n-space.  A 
decision  problem  consists  of  a subset,  S,  of  X.  A decision  rule  for  X is  a 
choice  function  C(S)  which  specifies  a subset  of  S.  intuitively,  C(S)  identi- 
fies the  members  of  S which  are  "preferred"  over  the  other  members. 

There  are  two  more  or  less  obvious  restrictions  on  C(S) . We  would  not 
expect  the  decision  maker  to  be  able  to  make  a choice  if  S were  unbounded, 
i.e.,  if  for  every  x in  S he  can  find  another  whicli  is  preferrable  to  x.  In 
the  same  vein,  we  would  not  expect  a choice  if  there  were  sequences  which, 
tliough  bounded,  had  no  limit;  again,  for  any  x,  there  could  be  a y preferable 
to  X.  Thus,  we  require  that  there  by  a C(S)  only  for  those  S's  which  are 
bounded  from  above,  and  which  contain  all  their  limit  points. 

A third  condition  is  less  common.  C(S)  is  required  only  for  S's  which 
are  convex  — i.e.,  if  x and  y are  in  S,  then  ax  + (l-a)y,  0 a ' 1,  is  in  S. 
This  is  the  condition  which  allows  a decision  maker  — if  he  chooses  — to 
reject  both  of  two  alternatives  and  select  a mixture  of  the  two  instead. 

It  has  the  small  drawback  that  no  pair  of  actions  x and  y can  be  compared 
directly.  Any  consideration  of  x and  y requires  taking  into  account  all 
possible  mixtures  as  well. 

With  these  preliminaries,  we  can  state  the  postulates  governing  the 
model . 

HI.  For  every  convex  S which  is  bounded  from  above,  and  closed, 

C(S)  exists.  C(S)  is  a subset  of  S. 

S is  bounded  from  above  if,  for  every  1,  there  is  a constant  c^,  and  for  every 
x in  S,  x^  < Cj^.  S is  closed  if,  for  every  sequence  x^,  if  x^  is  in  S and 
x^  -*■  X as  1 «>,  then  x is  in  S. 
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For  the  next  postulates  we  need  a definition.  Intuitively,  C(S)  imposes 
a partial  relationship  on  X in  that  if  x is  a member  of  C(S),  it  is  to  tliat 
extent  preferred  to  the  other  members  of  S.  This  relationship  will  be  indi- 
cated by  X > ' y. 

DHl . X > ' y means  there  is  an  S and  x belongs  to  C(S)  and  y is  in  S . 

H2.  Dominance.  If  x^  > y^  for  every  1,  then  x > ' y.  If,  in  addition, 

X . > y . , then  x > ' v . 

1 -1 


This  is  the  old  familiar  postulate.  The  first  Inequality  is  a straight- 
forward inequality  on  the  components  of  the  points  x and  y.  The  impli(^d 
inequality  is  a preference  relation  between  the  points  themselves.  x > ' y 
simply  means  x > ' y and  not  y > ' x. 

* 

> will  be  used  to  designate  the  ancestral  relation  of  >'. 

* 

DH2.  X > y means  there  is  a sequence  x^,...,x^,  x = x^  and 

y = x^  and  x^  >'  i < n. 

* 


That  is,  X > y if  there  is  a sequence  of  S's  such  that  is  in  C(S^)  and 


^i+i  ^i- 


★ 

H3.  Acyclicity.  If  x ^ y,  then  not  y ' x. 

■k 

H3  requires  that  > does  not  go  around  in  a circle  where  at  least  one  of 
the  links  is  a strict  inequality.  It,  in  effect,  enjoins  tlv  decision  maker 
from  engaging  in  a series  of  decisions  in  each  of  which  he  thinks  he  is 
bettering  himself,  and  winding  up  accepting  an  alternative  he  had  previously 
rejected.  This  postulate  is  closely  related  to  the  independence  of  irrelevant 
alternatives  axiom  that  will  be  discussed  In  Chapter  VI  on  group  values. 

H4.  Continuity . If  x -•*  y >*  z,  there  arc  numbers  a and  b , 0 < a 1 , 

0 < b < 1,  such  that  x >*  ax  + (l-a)z  y >*  bx  + (l-b)z  >*  z. 
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114  is  a iamiliar  condition  in  decision  theory.  It  guarantees  that  "similar 
problems  generate  similar  decisions." 

For  the  final  condition,  we  need  some  additional  notions.  Let  P designate 
the  class  of  y's  such  that  y > * x,  and  the  class  of  y's  such  that  x a:*  y 

i-e.,  is  all  the  points  that  are  "preferred  to  x"  in  the  sense  that  some  chain 
of  choices  leads  from  y to  x,  and  conversely,  is  the  set  of  points  that  x is 
preferred  to.  A positive  ray  R is  a line  where  if  x and  y are  points  on  R, 

1^^  ~ y^l  ■ 0>  and  x^  - y^  has  the  same  sign  for  all  i.  It  is  easy  to  show  that 
for  any  positive  ray  R,  and  any  point  x not  on  R,  R Intersects  P and  Q and 

X X 

there  exists  the  greatest  lower  bound  (g.l.b.)  of  P on  R and  the  lowest  upper 

X 

bound  (l.u.b.)  ofQ  onR. 

X 

H5.  Archimedean.  For  any  positive  ray  R,  and  any  x not  on  R,  g.l.b.  P 
coincides  with  l.u.b.  on  R. 

115  is  another  kind  of  continuity  axiom,  in  this  case,  for  the  boundaries  of  P 

X 

and  Q^.  In  the  form  of  H5  the  condition  is  perhaps  a little  heavy  handed;  but 

it  avoids  having  to  define  derivatives  on  the  preference  field  and  assuming  some 

19 

limit  on  the  local  variation  of  those  derivatives. 

H1-H5  are  sufficient  to  demonstrate  the  theorem 

* 

Theorem  HI . There  Is  a complete  order  on  X,  compatible  with  > , and 
C(S)  consists  of  the  maximal  points  in  S with  respect  to  this  complete 
order . 

The  proof  of  Theorem  HI  Is  fairly  intricate,  and  thus  has  been  relegated 
to  Appendix  II.  A sketch  of  the  proof  is  probably  sufficient  to  convey  the 
essential  points. 

A 

It  follows  directly  from  the  definition  that  > is  transitive,  since  if 

A A 

X > y and  y > z,  then  the  defining  sequences,  x > ' X2»...,x^  2^  ^ ' yi 
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y > ' >'  2,  form  a single  sequence  connecting  x and  z.  The 

transitivity  of  >*  and  dominance  imply  that  along  a positive  ray,  the  sets  P 
are  nested. 

Continuity  implies  that  the  P^'s  along  a positive  ray  are  "tight,"  that 
is  that  if  X,  approaches  x along  the  ray,  then  the  boundary  of  P approaches 

X X , 

1 

the  boundary  of  P . The  Archimedean  condition  assures  that  the  P 's  are 

X X 

distinct , i.e.,  if  x dominates  y,  then  the  boundary  of  does  not  intersect 
the  boundary  of  P^.  Figs.  40  a and  b illustrate  the  kinds  of  pathologies  ruled 
out  by  these  two  consequences. 

Finally,  acyclicity  implies  that  the  sets  of  P^^'s  determined  by  differ- 
ent positive  rays  all  fit  together  to  form  a single  system  of  sets.  The 
boundaries  of  this  system  of  sets  form  a set  of  equivalence  sets.  The  choice 
set  C(S)  of  a convex  set  S are  all  the  points  which  lie  on  the  highest 
equivalence  set  that  Intersects  S. 

In  summary,  if  it  is  assumed  that  an  individual  can  make  a choice  out  of 

every  convex,  closed,  bounded  from  above  set,  where  the  choice  fulfills 

the  axioms  of  dominance,  acyclicity,  and  continuity,  then  the  choice  can  be 

formulated  as  an  (ordinal)  utility  function  on  the  potential  actions,  and  the 

choice  is  made  by  selecting  the  action  (or  actions)  with  tiie  highest  utlJlty. 

This  theorem  complements  one  derived  by  Shapley  and  Siiubik,  in  which 

they  show  that  (with  a similar  underlying  model)  the  assumptions  of 

connexity,  dominance,  assymetry,  and  continuity  imply  a transitive  preference 
20 

function  on  X.  In  their  case,  the  need  for  an  arcliimedean  axiom  is  obviated 

by  assuming  transitivity  for  equivalence.  Roughly  speaking,  1 liave  shown  rha' 

dominance,  continuity,  and  acyclicity  (a  sort  of  weak  form  of  transitivity  and 

/ 

assymetry)  imply  connexity. 


LIM  Py,  y < X 


a)  NON  TIGHT  Px  ON  RAY  R 


b)  NON-DISTINCT  Px  ON  RAY  R 


Figure  40.  Pathologies  Ruled  Out  by  Continuity  and 
Archimedean  Assumptions 
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The  application  of  this  result  to  information  and  control  is  suggested 
by  Fig.  41.  The  diagram  has  been  simplified  to  display  only  two  contingencies, 
and  x^.  Five  potential  decision  rules  have  been  drawn  on  the  same  diagram 
for  comparison  — in  practice,  of  course,  each  would  fill  the  entire  space. 

A and  E are  limiting  cases  which  do  not  fulfill  the  continuity  axiom,  H4,  but 
do  fulfill  axiom  H5.  The  middle  rule  C is  the  Bayesian  rule,  maximize  expected 
value.  The  coefficients  of  the  straight  lines  ax^^  + bx2  = c,  normalized  so 
that  a + b = 1,  are  the  equivalent  of  the  probabilities  of  the  contingencies 
1 and  2 respectively. 

We  can  conceive  of  a scale  of  control  ranging  from  A to  E.  In  the  case 
of  A,  the  decision  maker  has  complete  control  of  all  the  contingencies  — 
he  can,  in  effect  determine  which  of  the  contingencies  will  occur.  Thus,  he 
can  use  as  a utility  function  for  an  action  x the  maximum  utility  x can 
achieve  over  all  the  contingencies.  The  decision  rule  is  thus  max  max  x.. 

X i 

At  the  other  extreme,  the  decision  maker  has  no  control  whatsoever.  The  most 
he  can  guarantee  with  any  action  x is  the  minimum  utility  over  all  the  con- 
tingencies. Hence,  the  decision  rule  is  max  min  x. . As  noted,  the  middle 

X i ^ 

case  is  that  where  the  probabilities  are  known,  and  the  decision  rule  is 

max  ^ ® ^ interesting  types  of  cases.  In  B,  the 

decision  maker  has  greater  control  than  simply  knowing  the  probabilities,  but 

not  complete  control.  Furthermore,  there  nuiy  be  no  way  to  resolve  the 

decision  problem  by  determining  which  of  the  contingencies  are  under  control. 

Tliey  may  all  be  equally  "incompletely  controlled."  An  illustrative  set  of 

2 2 

equivalence  curves  might  be  U “ ax^^  + bx2-  Under  what  circumstances  mlglit  a 
decision  maker  choose  to  behave  as  in  B?  A simple  case  might  be  one  where 
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the  decision  nuiker  can  influence  the  probabilities.  IE  he  selects  an  x witfi 


^ the  probability  of  i is  increased. 

For  D,  an  illustrative  utility  function  might  be 
with  case  B,  D might  be  a case  in  which  increasing  x^ 
probability  of  contingency  i.  For  example,  D might  be 
opponent  has  some  control  of  the  probabilities. 


I = Xj^x^.  In  analogy 
decreases  the 
a case  where  a hostile 
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CHAPTER  V.  AGGREGATION 


1,  Collective  Judgment 

Up  to  now  we  have  been  examining  individual  Judgment  from  various  points 
of  view.  In  a sense,  this  has  all  been  introductory  to  the  present  chapter, 
where  we  will  investigate  procedures  for  combining  individual  judgments  into 
group  judgments. 

In  the  most  general  sense,  we  can  think  of  the  aggregation  process  as  a 
way  of  combining  the  information  in  the  heads  of  the  members  of  a group  and 
using  the  pooled  information  as  a basis  for  a new  estimate.  In  theory,  the 
group  process  could  consists  of  each  individual  stating  everything  that  he  can 
recall  that  is  relevant  to  the  question  under  consideration,  and  then  applying 

some  inductive  procedure  to  the  combined  list  of  recalled  items.  For  most 

interesting  questions,  such  an  exercise  however,  is  totally  impractical.  As 
we  saw  in  Chapter  II,  even  relatively  simple  questions  can  generate  a massive 
amount  of  miscellaneous  "stuff,"  covering  the  gamut  of  relevance  and  solidity. 
Just  how  extensive  this  catalogue  is  for  everyday  decisions  has  never  been 
explored,  so  far  as  I know.  Some  of  it  appears  to  be  very  difficult  to  arti- 
culate; and  it  may  even  be  the  case  that  some  of  it  is  inartlculable , either 

for  lack  of  appropriate  words,  or  because  it  does  not  reach  the  level  of  full 

consciousness.  However,  even  assuming  that  all  of  the  material  can  be  elicited 
and  spread  out  for  full  view,  and  assuming  that  the  qualifications  of  relevance 
and  solidity  could  be  expressed  in  scales  comparable  across  all  members  of  the 
group,  there  would  still  remain  the  problem  of  taking  that  long  list  of  items 
and  formulating  an  answer  based  on  it.  At  present,  we  do  not  have  formal 
amalgamation  techniques  for  such  unstructured  material. 

In  addition  to  recall  of  relevant  material,  it  seems  to  be  the  case  that 
there  is  something  which  could  be  called  estimation  skill;  some  individuals 
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can  use  unstructured  inputs  more  effectively  than  others  to  generate  estimates. 
Presumably,  whatever  procedure  was  designed  to  capitalize  on  the  list  of 
relevant  material  would  also  have  to  aggregate  the  skill  components  of  the 
estimation  process. 

One  very  rough  approximation  to  some  of  this  is  found  in  an  aspect  of 
current  practice  with  group  decisions,  namely  discussion.  In  a discussion, 
it  is  possible  to  share  both  information  and  "insights,"  i.e.,  ways  of  putting 
the  information  together.  There  is  a fairly  extensive  literature  that  indi- 
cates that  in  practice  the  sharing  is  likely  to  be  incomplete.^  But  we 
can  Imagine  a process  In  which  all  relevant  material  is  elicited,  ail 
"Insights"  are  expressed,  each  individual  separately  aggregates  the  combined 
information  and  hints,  and  then  some  group  process  such  as  agreeing  on  a com- 
mon answer  is  used  to  arrive  at  a group  estimate. 

Without  extensive  experiments,  it  is  difficult  to  decide  how  effective 
procedures  of  this  sort  might  be.  Experiments  to  date  do  not  give  a clear 
picture  of  the  relative  effectiveness  of  various  types  of  group  interaction. 
There  are  a number  of  obscuring  factors:  differences  in  the  type  of  estima- 

tion task  (kind  of  question),  difficulties  in  controlling  group  dynamics, 
variations  in  figures  of  merit,  and,  of  course,  variations  in  I -dividual 
performance.  Small  groups  are  remarkably  complex  objects  aiul  the  number  of 
pi>  potential  kinds  of  organizations  that  can  be  devised  to  carry  out  even  so 

simple  a task  as  estimating  the  answer  to  an  uncertain  question  is  practically 
Infinite. 

One  methodological  hypothesis  that  has  guided  a great  deal  of  the  presei. 
work  is  just  this:  to  a first  approximation,  the  most  complete  summary  tliat 

an  Individual  can  (practically)  furnish  concerning  what  he  knows  about  a 
question  is  just  his  estimate  of  the  answer  to  tliat  question,  plus,  perliaps. 
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an  estimate  of  the  solidity  of  his  answer.  In  formulating  his  answer,  the 
Individual  has  taken  into  account  the  nuances  of  relevance  and  shadings  of 
solidity  tliat  apply  to  his  own  information. 

The  hypothesis  Is  not  easy  to  verify,  or,  for  tliat  matter,  to  express  in  a 
manner  leading  to  simple  experiments.  That's  wiiy  1 call  It  a working  hypothesis. 

The  l)ypothesis  suggests  that  most  of  what  the  group  has  to  offer  can  be 
realized  by  starting  with  separate,  in  fact,  independent,  estimates  from  the 
members  of  tlie  group  and  seeking  tiie  most  effective  formal  ways  to  combine 
these  independent  estimates  into  a group  estimate.  We  can  call  this  simple 
procedure  an  elementary  group  estimate . A number  of  ancillary  desirable 
features  come  along  with  this  approach:  (a)  The  definition  of  the  group 

process  can  be  made  explicit  and  precise.  (b)  Application  of  figures  of 
merit  can  be  pursued  by  theoretical  investigations,  as  well  as  by  experiment. 

(c)  A kind  of  "rock  bottom"  level  of  performance  is  defined  which  can  act  as 
a criterion  for  other  group  procedures.  (d)  The  procedures  are  remarkably 
easy  to  implement  (and  replicate)  in  practice. 

P'or  elementary  group  estimates,  then,  there  will  be  a set  of  individual 
responses  R (Rj^,...,R^)  where  R^  is  the  response  of  individual  1,  and  n is 
the  number  of  members  of  the  group.  These  responses  are  relative  to  a uni- 
verse of  discourse  U,  and  (usually)  to  a specific  question  concerning  U. 

For  most  cases,  the  specific  question  will  be  represented  by  a particular 
partition  of  U into  an  event  space  S' j K In  which  case  the  individual  res- 
ponses will  be  of  the  form  R^^  — i's  estimate  for  the  event  . 

A group  judgment  is  some  function  F,  whicli  generates  a group  response 
G based  on  the  responses  R.  F can  be  a function  of  more  than  just  the  overt 
individual  responses;  It  may  depend  on  the  specific  group  (e.g.,  in  the  form 
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of  differential  weights  on  the  individuals),  or  on  the  form  of  the  question. 

By  and  large,  factors  of  this  sort  will  be  dealt  with  by  specific  notation 
when  called  for.  For  general  discussion,  we  can  write  G = F(R) . 

With  this  simple  definition  of  a group  response,  we  can  investigate  a 
number  of  pertinent  questions:  (1)  How  does  the  group  compare  to  the  individ- 

ual members  of  the  group?  This  question  divides  into  two  subquestions: 

(a)  How  does  the  group  performance  compare  to  the  average  performance  of  t.i- 
individuals?  (b)  How  does  the  group  compare  to  the  best  individual? 

(2)  How  does  the  accuracy  of  the  group  depend  on  the  amount  of  disagreement 
(dispersion)  among  the  members?  (3)  How  does  the  group  compare  to  the  a prlt'rl 
knowledge  available  without  the  group?  (4)  How  much  is  lost  by  employing 
various  approximative  techniques  for  aggregating  the  individual  responses? 

The  generic  notion  of  an  n-heads  rule  will  be  used  to  refer  to  the  demon- 
stration that  the  group  performance  is  superior  to  the  individual  peri>  rmance 
in  some  specified  way.  Given  the  wide  variety  of  estimation  types,  and  the 
range  of  figures  of  merit  discussed  in  Chapter  ill,  a broad  spectrum  of 
n-heads  rules  can  be  explored. 

2 . Basic  Rules 

In  this  section  a number  of  n-heads  rules  are  examined  that  are  of  a 
paeticularly  simple  form.  They  apply,  for  the  most  part  to  any  quantities 
that  are  determined  at  least  to  an  interval  scale.  Tlie  rules  assume  only  the 
existence  of  a set  of  individual  estimates  R,  and  an  unknown  true  response  T. 
They  are  "distribution  free"  ■ i.e.,  are  Independent  of  the  shape  of  the 
distribution  of  the  responses.  In  this  respect,  the  rules  are  closer  akin  t 
arithmetic  than  to  statistics. 

Given  R and  T,  there  are  three  definitional  items  needed  to  formulate 
an  n-heads  rule:  (1)  an  aggregation  rule  F(R) , (2)  a score  rule  S(R,T),  and 
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(J)  a criterion  loi  comparln;;  inUiVn.u.i,  „ i«J  ,;riMip  .n:oro;..  A typical  criterion 
Is  the  difference  between  the  average  individual  score  and  tlie  group  score, 
r.enerallv,  it  is  not  possible  to  optimize  sucli  a criterion  for  a given  score 
rule,  since  T is  unknown.  However,  it  is  possible  to  establish  useful 
inequalities . 

Tlie  number  of  possible  combinations  of  aggregation  rules,  score  rules  and 
criteria  is  essentially  unlimited.  In  this  section  I have  limited  tlie  inves- 
tigation to  a few  aggregation  rules  resembling  measures  of  central  tendency, 
to  various  simple  scaled  distance  scores,  and  to  either  the  criterion  compar- 
ing the  group  score  with  the  average  Individual  score,  or  the  analogous  cri- 
terion with  the  median  substituted  for  the  average. 

Table  I displays  seven  such  rules  showing  the  aggregation  function,  the 
score  rule,  and  the  n-heads  statement.  Number  1,  for  example,  states  that 
the  error  of  the  mean  (average)  is  always  less  than  or  equal  to  the  average 
individual  error. 


Table  I.  Tlementary  N-Heads  Rules 


Aggregat ion 
Function 

1.  R (Average) 

2.  R (Average) 

3.  R (Average) 

4.  R (Average) 

5.  Md  (Median) 


Score 

Rule 

iR  - T| 

(R  - T)^ 


(R  - T)' 

~[f] 

Ir  - t| 


N-lleads 

Rule 

|R  - T|  < 1/n  I |r  - T 


(R  - T)^  < 1/n  1'  (R  - T)^ 


R - T 

R - T 

T 

T 

5 1/n  I 
1/n  Z 


R - T 


(1-  T)!  < V 


iMd  - t|  < Md  [R  - T] 


6.  CM  (Geometric  |logR-logT|  |logCM-logT|  - 1/n  i,  |logR-logT| 

Mean) 

7.  HM  (Harmonic  |l/R  - 1/t|  |1/HM  - 1/t|  £ 1/n  I |l/R  - 1/t| 

Mean) 

All  i^'s  over  the  set  of  individual  responses. 
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Except  for  5,  the  rules  state  that  a given  scaled  distance  score  is 
smaller  for  some  measure  of  central  tendency  than  the  average  of  that  score 
for  the  individuals.  5 has  the  same  form  except  that  the  median  is  substi- 
tuted for  tlie  average.  The  analogues  of  2,  3,  4 for  the  mean  liold  for  the 
other  three,  but  don't  appear  to  have  much  relevance  for  present  practice. 

7 may  also  appear  Co  be  somewhat  "academic,"  since  the  liannonic  mean  has  not 
been  used,  in  my  experience,  to  aggregate  individual  estimates.  1 have 
included  it  in  part  to  sliow  that  tlie  general  form  of  an  elementary  n-hoad^ 
rule  is  not  restricted  to  the  better  known  types  of  scores  or  aggregat ion 
rules;  but  also  it  is  possible  that  for  some  types  of  estimates  — e.g., 
ratios  — the  harmonic  mean  may  be  an  appropriate  aggregation  function. 

The  rules  involving  averages  of  absolute  values  all  follow  from  a single 
principle,  namely 


I Z X I < X)  hi  I 

' i ^ ' i ' 
which  follows  directly  from 

|£  ^ |=Ex  - E I \ 

' 1 ' -f  J - ' 


(1.) 


where  E is  the  sum  over  the  positive  x's  and  E is  the  sum  over  the 

+ 

negative  x's.  This  is  cleat  v less  than  or  equal  Co  E x x.  J =E(X 


The  rules  involving  squared  differences  follow  from 

1/n  E Xj  > (1/n  E x.)^ 

1 i 

set  1/n  Ex,  = m,  then 

1 


(2) 


0 


(x^  - > 

2 

(x^  - 2xm  + m‘‘)  0 

2 > 

X - m*"  2 0 
i 

whence  (2)  follows. 

The  various  rules  follow  from  (IJ  and  (2)  by  substituting  the  appropri- 
ate expression  for  x;  e.g.,  x = (R  - T)/n  for  1.  The  scaled  rules,  3 and  4, 
follow  from  1 and  2 via  the  fact  that  dividing  each  side  of  an  inequality  by 
a positive  constant  does  not  affect  the  inequality.  The  rules  involving 
scaled  values  are  not  significant  If  T • 0,  whence  most  of  the  scaled  rules 
are  appropriate  only  for  ratio  scales  witli  T greater  than  zero. 

From  (2)  we  can  formulate  a more  i lluminat ing  form  of  2,  namely, 

(R  - T)^  = l/n53(R  - - Var  (R)  (3) 

(3)  is  the  same  as  (1)  in  Chapter  III,  now  applied  to  a set  of  individual 
estimates,  rather  than  to  a sequence  of  estimates  by  a single  individual. 

(3)  states  that  the  squared  error  of  the  mean  is  equal  to  the  average 
squared  error  of  the  individuals  minus  the  variance  of  the  individual  res- 
ponses. Thus,  the  advantage  of  the  mean  over  the  individual  increases  with 
the  amount  of  disagreement  among  the  individuals. 

These  elementary  n-heads  rules  provide  a justification  for  using  a group 
estimate  where  there  is  little  or  no  basis  for  Invoking  one  of  the  theories 
of  estimation  from  Chapter  II.  Their  meaningfulness  for  group  estimation  may 
seem  a little  mysterious  since  the  rules  themselves  are  true  for  any  set  of 
numbers  R,  and  any  other  number  T.  The  link  is  provided  by  the  tacit  assump- 
tion in  practice  that  the  group  furnishing  tlie  estimates  R have  some  pertinent 
information  concerning  the  number  T. 


1/n 

1 

1 / u X) 
1 

1/n  L 
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We  can  derive  a statement  which  contains  the  size  of  the  group  as  an 
explicit  factor.  Roughly  speaking,  the  error  of  the  group  declines  with 


increasing  n,  the  number  of  members  of  the  group.  This  will  be  stated  only 
for  the  most  elementary  formulation  of  this  relationship,  primarily  to  illus- 
trate that  even  for  the  "rock  bottom"  n-heads  rules,  n is  a significant 
parameter.  More  diagnostic  formulae  relating  the  size  of  the  group  to  accur- 
acy can  be  derived  using  the  various  theories  of  estimation. 

Suppose  we  have  a group  of  n individuals.  We  can  ask  how  tlie  .iccuracy  of 
this  group  compares  with  the  average  accuracy  of  the  n subgroups  that  can  be 
formed  by  leaving  out  one  member  at  a time.  Let  designate  the  average  of 
the  n-1  responses  omitting  response  R , i.e.,  x'^  = 53  x..  We  have, 

from  1,  Table  I. 

|l/n  Y,  xJ  - t|  - 1/n  ^ I x^  - T| 


whence 


1/n  XI  x-^  = 1/n  Y rV  53  X = i 53  X , 

J . n-1  i n 1 i 


1 1/n  51  i -^51  - 

^ 1 n J 


t 


(4)  asserts  that  the  average  error  of  the  subgroups  with  ji-1  meml ers  is 
greater  than  (or  equal  to)  the  error  of  the  total  group  with  n members. 

Since  this  Is  true  for  any  n,  the  average  error  monotonlcal  ly  decreast's  witli 
n (providing  we  average  over  all  available  respondents  for  eacii  potential 
group  of  n.) 

Another  way  of  viewing  the  elementary  n-heads  rules  in  Table  I that  mav 
illuminate  their  applicability  to  group  estimation  is  the  following.  Suppose 
we  assume  that  each  individual  has  an  equal  probability  ot  h»'ing  correct. 


Wf  would  like  to  find  a group  estimate  G that  minimizes  the  expected  error. 

In  the  case  of  the  squared  distance  as  the  figure  of  merit,  we  would  like  to 
minimize  the  expectation  of  (G  - T)^.  In  case  individual  i is  correct,  the 

2 9 

error  would  be  (G  - R^)  and  ttie  expected  error  is  then  1/n  (G-R.)^.  If 

we  differentiate  this  expression  with  respect  to  G,  and  set  the  result  equal 
to  0,  we  obtain 

1/n  2(G-R.)  = 0 

. L 

1 

whence 

G = 1/n  22  R (5) 

1 ^ 

In  the  context  of  adjudicating  disagreement  within  the  group,  each 

2 

individual  1 "sees"  the  group  as  making  the  error  (G  - R^)  , (5)  states 

that  R minimizes  the  average  perception  of  the  group  error. 

1.  Theory  of  Errors 

The  theory  of  errors,  as  expounded  in  Chapter  II,  assumes  that  each 

individual's  response  is  a sum  of  the  true  answer,  a bias  term,  and  a random 

error;  i.e.,  R^  = T + , where  T and  are  constants  and  is  dis- 

* 

tributed  with  zero  mean  and  some  standard  deviation  S^.  Each  individual 

response  is  thus  a random  variable,  with  a distribution  B^(Rj^)  , with  standard 

deviation  and  mean  » T + B^^. 

By  definition,  the  error  distributions  of  different  individuals  are 

Independent,  since  the  errors  are  assumed  to  be  random.  The  vector  R thus 

has  the  joint  distribution  D(R)  - fj  D (R  ).  The  Joint  distribution  D(R) 

1 ^ ^ 

determines  a distribution  for  the  average  of  the  individual  responses, 

* 

The  non-convent ional  notation  S is  used  for  the  standard  deviation  in 
this  section  to  allow  a simple  distinction  between  expressions  referring 
to  estimates  and  expressions  referring  to  the  logarithms  of  estimates. 
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R = 1/n  53  R^;  we  can  call  this  derived  distribution  I)(R).  The  importance  of 
i 

U{R)  for  group  estimation  lies  in  the  presumption  that  R is  a reasonable  expres- 
sion of  the  group  response . Part  of  the  basis  for  this  presumption  is  simple 
carryover  from  standard  statistics,  where  R is  the  most  common  representative 
statistic,  or  equivalently,  the  most  common  measure  of  central  tendency.  In 
standard  statistics  the  role  of  a construct  like  R is  to  characterize  a 
population.  The  role  of  a group  response  in  group  decisions  is  to  obtain  (le 
most  accurate  — or  highest  scoring  — estimate  based  on  the  individual  re^-ponsos. 
We  will  show  below  that  there  are  some  persuasive  n-heads  rules  associated 
with  R.  However,  it  should  be  emphasized  that  these  rules  arc  not  simple 
extensions  of  the  role  of  R in  standard  statistics.  In  particular,  the  notion 
of  R as  a representative  statistic  is  probably  misleading  as  an  "explanation" 

for  its  usefulness  as  a group  response. 

2 

By  a well  known  result,  the  mean  M of  R is 

M = 1/n]^  M . (b) 


The  variance  S of  R is 
S 


' - S? 


s = l//nyi/n5])s^ 


(7) 


(8) 


The  first  n-heads  rule  to  follow  from  the  theory  of  errors, 
then,  could  be  labelled  the  n-heads  rule  for  tlie  standard  deviation. 

(8)  asserts  that  the  standard  deviation  of  R is  l/\/u  times  the  square  root 


(6)  liulds  for  any  joint  distribution.  (7)  contains  the  assumption  that 
the  individual  responses  are  Independent. 
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of  the  average  of  the  individual  variances. 


If  the  individual  variances  are 


roughly  equal,  then  the  standard  deviation  of  the  group  response  is  less  than 

the  Individual  standard  deviations  by  a factor  of  l//n.  In  behavioral  terms, 

the  average  random  error  of  R will  be  smaller  by  a factor  of  l/t'^  than  the 

random  error  of  the  individuals.  Thus  the  group  response  will  be  more  stable 

than  the  individual  responses,  and  the  likelihood  of  a large  random  errors  on 

the  part  of  the  group  will  be  reduced. 

(8)  tells  us  nothing  about  the  bias  of  R.  A second  n-heads  rule  can 

be  derived  v;hich  deals  with  the  bias.  We  have  M = 1/n  /Z  = T + 1/nV'fi  . 

i " i ‘ 

The  bias  of  the  mean  B is  thus  1/n  B . From  1,  Table  I, 

1 ^ 

|l/n  i:  B I < 1/nX;  ,'bJ  (9) 

1 i 

In  words,  the  bias  of  the  mean  is  always  less  than  or  equal  to  the  mean 
bias.  Invoking  3 we  can  assert 

- Var(B)  (10) 

In  words,  the  squared  bias  of  the  mean  is  equal  to  the  average  of  the 

individual  squared  biases  minus  the  variance  of  the  individual  biases.  If 

all  the  Individual  biases  are  the  same,  then  the  group  offers  no  advantage  as 

far  as  bias  is  concerned;  if  the  individual  biases  differ,  then  the  group 

advantage  is  measured  directly  by  the  variance  of  the  individual  biases. 

For  the  total  error,  the  two  effects  "add";  the  expected  squared  error 
2 2 2 

of  the  mean  E(R  - T)  = B + S . The  same  expression  holds  for  the  individual 

2 2 2 

expected  square  errors,  l.e.,  K(R^  - T)  “ Thus,  if  we  abbreviate 

E—  2 

E(R  - T)  , 
i ^ 

by  ESE,  we  have 

E(R  - T)  - 1/n  ESE  + 1/n  B B . (11) 
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It'  the  Individual  biases  include  both  positive  and  negative  instances,  so 
that  the  second  term  in  1 1 is  small,  then  abbreviating  E(R  - by  EM 

(expected  error  of  the  mean) , we  have 

EM  - 1/yn  ESE  (12) 

As  an  illustration,  consider  the  values  in  Table  II. 


Table  II 


B. 

1 

-1 

2 

3 


1 

1 

1.5 

2 


ESE. 

1 

O 

6.25 

13 


ESE  = 7.0833 

EM  = 4.1107 

1/2^  ESE  = 4.090 


The  situation  is  somewhat  more  complex  for  the  expectation  of  the 
absolute  error,  e|r  - t|.  For  any  distribution  D(R)  of  R,  the  expected 
absolute  error  can  be  computed  as 


e|r 


- R)D(R)  + 


(R  - T)D(R) 


(13) 


Rearranging  the  terms  in  (13)  and  adding  and  subtracting 


/ 0(R) 


l(R)  and  f"  RD(R)  we  obtain 

— OO 


e|r  - t| 


M - T + 2T 


D(R)  - 2 


RD(R) 


(14) 
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(I't)  iK  noL  particularly  Itifomiativc  without  knowln;;  the  form  of  the  dlstrl- 

Inition  I)(R).  If  we  introduce  the  psvclionumer Ic  hypothesis  from  Chapter  II, 

i.e.,  assume  that  the  Individual  responses  are  log  normal,  and  in  addition 

assume  that  they  are  independent,  then  we  can  derive  that  tlie  distribution  of 

1 


the  geometric  mean  I n R. 

^1  ^ 


n is  log  normal. 


3 


Using  the  notational  convention  of  Chapter  II,  where  lower  case  letters 
refer  to  the  logarithms  of  quantities  expressed  by  upper-case  letters, 
r^  = log  mean  of  individual's  log  response  distributions, 

r = 1/n  ^ r^  is  the  logarithm  of  the  geometric  mean  of  R,  m = 1/n  ^ m.  is 

tlie  mean  of  D(r)  , the  distribution  of  mean  log  responses.  Corresponding  to  (7) 


we  have 


s^  = 1/n^  53  where  s^  is  the  variance  of  the  individual 


log  responses,  and  s is  the  variance  of  r.  D(r)  can  be  formulated  explicitly; 
it  is 


D(r)  = -rrr- 


(15) 


_ (r-m) 

— e 2 
/2tt  s 

There  is  a corresponding  expression  for  R,  but  there  is  no  particular 
point  in  writing  it  down  here. 

Introducing  (15)  in  (14)  with  the  appropriate  transformation  of  vari- 
ables, we  have  (ae  means  average  error) 


ae  = m - t + 


2t  f Mr)  - 2 tD(t) 

— oo  _oo 


(16) 


If  we  set  b = (t-m)/s,  and  perform  the  integration  on  the  last  term,  and 
recombine,  we  obtain 


ae  » -bs  + 2bs<l>(b)  - /ipnse.  ^ 


(17) 


wiiere  1>  (b)  is  the  cumulative  normal  distribution  with  zero  mean  and  unit 


standard  deviation  evaluated  at  b. 


Thus,  we  can  write 


ae  = sf(b)  1^2  (18) 

where  f(b)  = -b  + 2b<{’(b)  - /2/tt  e ^ . 

(18)  is  perhaps  deceptively  simple,  since  b involves  s. 

I f we  take  the  derivative  of  f (b)  witli  respect  to  b,  we  obtain 

iiiM  = _i  + <{,(b)  (19) 

The  cumulative  normal  is  virtually  a constant  beyond  b = 2;  hence  for 
b 2 2,  f(b)  is  essentially  a straight  line  witli  slope  1 and  ae  is  directly 
proportional  to  s. 

In  Figure  42  the  observed  log  error  is  plotted  against  the  observed 

4 

standard  deviation  of  log  responses  for  roughly  300  almanac  questions. 

The  subjects  were  upper-class  and  graduate  college  students.  The  number  of 

subjects  per  question  was  about  14  (ranging  from  11  to  15).  The  lower 

dashed  line  is  computed  from  (17)  assuming  b = 0,  and  assuming  tliat  the 

observed  standard  deviation  is  an  acceptable  estimator  of  /l/n  52  = s. 

. i 
1 

The  latter  assumption  is  correct  only  if  the  variance  of  the  bias  is  ai>proxi- 
mately  equal  to  s,  the  variance  of  r.  There  is  reason  to  suspi'ct  that  for  this 
set  of  data,  the  variance  of  the  bias  is  greater  than  s,  in  which  case  tiie 
observed  standard  deviation  is  an  overestimate  of  s,  and  the  dashed  line 
should  be  lower;  however,  there  is  no  way  to  determine  the  individual  vari- 
ances from  the  date,  and  Fig.  42  can  be  used  only  to  set  a lower  bound  on 
the  bias. 

From  Fig.  42  E/s  ~ .65  v/l4  ~ 2.43.  Since  E/s  is  the  estimate  ot  the 
bias,  b,  we  see  that  for  this  data,  b > 2,  and  hence,  the  relationship 
between  E and  observed  standard  deviation  sliould  be  a simple  proportion,  wliich 
indeed  the  figure  demonstrates. 
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OBSERVED  STANDARD  DEVIATION 

Figure  42.  Relation  Between  Log  Error  and  Observed  Standard 
Deviation 
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For  tills  particular  set  of  data,  then,  the  bias  nakes  a much  larger 
contribution  to  average  error  than  the  average  random  deviation.  We  would 
expect  that  the  major  contribution  of  the  group  in  reducing  the  expected  error, 
arises  from  reducing  the  bias  via  (9)  then  from  reducing  the  standard  devia- 
tion via  (8),  but  additional  analysis  of  the  data  would  be  required  to 
establish  that  hypothesis. 

4 . Factor  Model 

Most  of  the  formalism  needed  to  discuss  n-heads  rules  for  factor  inode  I ■ 
of  estimation  has  been  presented  in  the  tieatment  of  correlation  and  iiniiorn 
weights  in  Section  7,  Chapter  IV.  In  fact,  the  analogy  between  aggregating 
a set  of  individual  responses,  and  forming  a model  of  multi-dimensional 
functions  is  quite  close. 

In  keeping  with  the  spirit  of  the  factor  model,  a somewhat  more  general 
kind  of  aggregation  rule  will  be  c.'iamined.  Rather  tlian  specializing  immedi- 
ately to  the  simple  average,  we  first  look  at  a weighted  average,  i.e., 

R = ^ a.R.,  where  R is  the  group  response  and  R,  is  the  response  of  individ- 
i 

ual  i.  The  notion  of  a weighted  average  for  aggregating  individuals  has  a 
certain  amount  of  appeal,  on  the  grounds  that  some  individuals  ire  more 
likely  to  give  an  accurate  response  than  others,  either  because  of  greater 
information  or  greater  skill  in  forming  estimates  or  both.  However,  the 
issue  of  how  to  assign  the  weights  does  not  appear  to  liave  a satisfactory 
resolution  at  present.  1 will  use  the  results  presented  in  the  treatment  of 
equal  weights  in  Chapter  IV  to  show  that  unless  the  individuals  are  very 
different  in  their  capabilities,  little  is  gained  by  non-uniform  weighting. 

Using  correlation  as  the  figure  of  merit  for  the  estimates,  we  have 
p(R,T)  = E[ (R-R) (T-f ) /Sj^s^] . Unpacking  this  expression  in  terras  of  R,  and 
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rearranging,  wc  obtain 

P(R,T)  = 1/s^  52  a.SjP(K^,T)  (20) 

where  s.  is  the  standard  deviation  of  individual  i's  estimates  and  p(R^,T) 

is  the  correlation  of  individual  i's  estimates  with  the  true  values  T.  The 

correlation  of  the  weighted  average  of  the  individual  estimates  with  the  true 

values  is  Just  the  weigtited  average  of  the  individual  correlations  with 

2 - 2 

a s./s„  as  weights.  Since  s = E[(K  - R)  ],  unpacking  gives 
1 1 K K 


+ 2 E Vj"i"j 

1 


p(R^,R.)  is  the  correlation  of  individual  i's  estimates  with  individual 
j ' s est imates . 

There  is  no  loss  of  generality  in  assuming  that  the  individual 

estimates  have  been  normalized  by  z-scores,  so  that  s.  = 1 for  all  i, 

1 

and  (20)  thus  becomes 

p(R,T)  = (22) 

S = E + E p (R..R.)  (23) 

R i ^ i<j  3 ^ 3 

It  is  clear  from  (23)  that  s is  a maximum  when  p(R  ,R  ) = 1 for  all 

R 1 J 

i and  j.  in  this  case,  s^  = E^l  2 E ‘’g  “ 


and  thus 


a 

)(R,T)  2 E T7^  P(R..T) 

, a , L 


^ J 

That  is,  the  correlation  of  R (weighted  average  of  the  individual  estimates) 
with  T is  greater  than  or  equal  to  the  weighted  average  of  the  individual 
correlations.  In  particular,  if  we  have  the  case  of  equal  weights. 
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(25) 


P(R,T)  > 1/n 


(25)  is  the  most  straight forwai d form  of  n-heads  rule  for  factor  models. 

It  asserts  that  the  correlation  of  the  average  of  a set  of  estimates  with 
the  true  answer  is  greater  than  t)ie  average  of  the  Individual  correlat ions, 
equality  occurring  only  in  tlie  uninteresting  case  ttiat  all  tlie  individuals 
give  the  same  estimates. 

If  the  responses  of  the  individuals  are  independent  — p(R^,Rj)  = 0 tor 

all  i and  j — then  setting  p = 1/n  ^ p (R.,T), 

I ^ 

p(R,T)  = ✓n  p (26) 


The  correlation  of  the  average  response  with  tlie  true  answer  is  precisely 
/n  times  the  average  correlation  of  tire  individual  responses  with  the  t riu‘ 
answer.  Die  assumption  of  independence  does  not  appear  very  plausitile  i!  thi' 
set  of  questions  all  refer  to  tfie  same  quantity,  l.e.,  each  individual  is  mak- 
ing estimates  witliin  the  same  model.  However,  the  fornial  apparatus  developed 
above  applies  eiiually  well  to  the  case  of  a list  of  separate  questions.  It  is 
not  clear  that  the  correlation  is  a useful  figure  of  merit  in  the  case  of  a 
miscellaneous  string  of  questions,  but  to  tiie  extent  that  covariance  of  an 
individual's  answers  with  the  true  answers  indicates  some  knowledge,  and  to 
the  extent  that  the  individual's  answers  are  independent,  (26)  indicates  a 
strong  advantage  of  the  average  answer  over  the  individual  answers. 

In  general,  there  will  be  a set  of  optimal  welglits  lor  the  individuals 

which  maximizes  the  correlation  of  the  weighted  average  with  the  true 

* 

answer.  Such  weights  are  difficult  to  come  by  In  practice.  However,  il.iii 
is  a simple  analogy  between  estimating  a quant  itv  with  a linear  combination 


It  is  necessary  to  know  both  the  individual  correlations  )(Rj.T)  and  thi-  inter- 
correlations  p(R^,R^). 
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1 

^ of  vari.ibles,  and  estimating  a quantity  by  a linear  ct>nblnation  of  separate 

1 individual  estimates.  If  the  t>ptimal  wei)thts  are  completely  unknown,  then  (14) 

(.'haptir  iV  can  lie  Invokeil  to  demonstrate  tliai  the  maximum  expected  correlation 
of  an  assumed  linear  cor.il>  i na  t i on  of  individual  es'  imates  with  th<>  optimallv 
wi'inhted  combination  is  obtained  wltli  equal  welp.hts.  As  Is  clear  from  the 
numerical  results  in  Chapter  IV,  a ftreat  deal  must  be  known  about  the 
Individual  estimates  before  something  better  than  uniform  weights  can  be 
devised . 

5.  The  Impossibility  Theorem 

The  aggregation  of  probability  estimates  differs  from  magnitudes  esti- 
mates in  tliat  the  theory  of  probability  imposes  a relatively  rigid  set  of 
} constraints  on  the  resultant  group  estimates.  In  Chapter  II,  Section  5,  the 

three  axioms 

Al.  0 P(E) 

A2,  P(U)  = 1 

A3.  P(E  v F)  = P(E)  + P(F),  providing  F..F  = 0, 
were  listed  as  basic  postulates  for  numerical  probability.  Assuming  that 
the  individual  members  of  the  group  are  consistent  probability  estimators, 
their  estimates  will  follow  Al  - A3.  Similarly,  if  we  have  a group  function, 

F(R),  it  must  also  fulfill  A1-A3. 

f 

I f we  let  L stand  for  the  unit-vector  (all  components  = 1),  we  have 

01.  0 <F(R) 

02.  F(L)  =•  1 

03.  F(R+S)  = F(R)  + F(S),  providing  < 1. 

Wliere  R+S  = (R,+S,,...,R  +S  ).  03  may  seem  a little  strong,  since  A3  is 

11  n n 

asserted  only  for  those  cases  in  whicli  the  appropriate  events  are  exclusive. 
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However,  given  any  R and  S which  fulfill  ^ there  is  a potenlial 

set  of  estimates  for  some  E and  F where  E and  F are  exclusive  and  R and  S 
are  the  individual  estimates  fc'r  E and  F respectively,  so  there  is  no  loss 
of  generality  in  omitting  the  c-xclusivity  condition, 

GA.  F is  a function  solely  of  the  numerical  vector  R, 

Other  functions,  where  F depends  on  additional  features  of  tlie  decision  situa- 
tion can  be  devised.  For  example,  F could  involve  various  kinds  of  depend- 
encies among  the  R^^,  Functions  of  this  sort  will  be  treated  in  Section  7 below. 
In  the  present  section,  attention  is  limited  to  functions  v;l,  ich  depend  only 
on  the  set  of  individual  estimates. 

One  additional  assumption  completes  the  set, 

G5.  r is  a continuous  function  of  R. 

Theorem  1:  A1-A3,  G1-G5  imply  that  F = ^ 

i 

where  ^ !-• 

The  theorem  states  that  the  only  function  fulfilling  A1-A3,  G1-G5,  is 

the  linear  function  ^ '^^^h  constant  coefficients  summing  to  1.  In 

i ^ ^ 

other  words,  the  group  estimate  is  a weighted  average  of  the  individual 
est imates . 

Lemma  1:  F(L-R)  = l-F(R) 

Proof:  F(L)  = F(L-R  + R)  = F(L-R)  + F(R)  , frc'in  G3. 

From  G2,  F(L)  = 1,  whence  the  result  follows. 

Lemma  2 ; F(aR)  = aF(R),  where  a is  any  positive  rc-al  number,  aR  < i. 

Proof : From  G3,  F(nR)  = nF(R),  nRj  < 1,  where  n is  an 

integer.  Similarly,  F(R)  = mFl  i r)  . Putting  these  two  togetlier, 

' m 

we  get  f(—  r)  “ — F(R).  Since  F is  continuoiis  In  R,  the  result 
' m ' m 

f oilows. 
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Lemma  3:  If  f(x)  is  a function  of  a single  variable,  and 


f (x+y)  = f(x)  + f(y),  then  f(x)  = ax,  with  constant  a. 

Proof : Sy  an  argument  similar  to  that  in  I.eirana  2,  we  obtain 

f(ax)  = af(x).  Since  f is  a function  of  a single  variable,  we 
can  set  x = lx,  whence  f(lx)  = xf(l),  and  setting  a = f(l), 
the  result  follows. 

Lemma  4:  Let  denote  the  vector  where  R^  = R.,  and  R^  = 0,  i#i. 

1 1 i 

Then  F(R)  = Yj  F(R^). 

1 

P roo  f : From  G3  and  R = ^ 

i 

Lemma  5:  Let  F, (R)  = F(R^).  Then  F.(R)  = a.R,  . 

i 111 

Proof : Since  F^(R)  is  a function  of  R^  alone. 

Lemma  3 applies. 

Putting  Lemma  4 and  Lemma  5 together,  we  obtain  F(R)  = 

i 

Lemma  6 . ^ a ^ = 1 . 
i 

Proof : L = ^1^,  whence  F(L)  = ^ a F(l^)  = 52  = !• 

i 1 ^ i ^ 

but  5^  a - 52  a . 

1 1 

This  completes  the  proof  of  the  theorem. 

In  a previous  publication,  1 announced  an  impossibility  theorem  for 

aggregation  of  probability  estimates.^  The  impossibility  arises  from 

adding  one  further  condition  to  G1-G5,  namely: 

G6.  F(R*S)  ■=  F(R)F(S),  where  R*S  = (R,S, R S ). 

11  n n 

G6  embodies  the  product  rule,  P(E.F)  =•  P(E)P(f|E).  This  rule  is 
sometimes  taken  as  a postulate  in  probability  theories,  and  sometimes  taken 
as  a consequence  of  the  definition  of  the  conditional  probability  P(F|e)  = 
P(E.F)/P(E).  As  in  the  case  of  exclusivity  for  G3,  the  analogue  of  the 
product  rule  for  groups,  G6,  must  be  expressed  without  the  restriction  that 
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the  relevant  events  are  independent  since  for  any  R and  S,  there  may  be  a 


pair  of  events  E and  F where  R is  the  set  of  individual  estimates  (jf  (’(E) 

and  S is  the  set  of  individual  estimates  of  P(e|f). 

Theorem  2:  A1-A3  and  G1-G6  are  incompatible. 

Proof:  y'  a.R.S.  ^ a.R.)(  y'.  a.S.).  Tlie  non-equalitv  is 

^iii 

1 it 

clear,  but  to  give  a simple  example:  if  a^  = 1/n  for  everv  i, 

then  y^  a.R.S.  = 1/n  V*  R.S.  whereas  ( y^  a.R.)(  a.S  ) = 

• 111  "11  "11.11 

1 1 1 1 

9 a— « 

1/n  ( ^ ^i^i  ^ . ) . Equality  would  require 

i ^ i/j 


n-1)  y"  R.S.  = y\  R.S..  If  n = 2, 

T ^ ^ il^j  ^ J 


this  implies 


y]  R.S,  = R.S.,  which  holds  onlv  if  R,  = R„  or  S,  = S , 

^ 1 i 1 ) - 1 2 1 2 

a . 

G6,  with  tile  other  conditions  except  G3,  impiie.s  that  F(R)  = [/  Rj 

i 

with  a.  = 1.  Tliis  follows  directly  from  Theorem  1 by  setting 
i 

r^  = logR^,  r = (r  , . . . , r^)  , and  rewriting  G6  as  G'6,  F'(r+s)  = F'(r)  + F'(s). 
Then  Tlieorem  1 states  F'(r)  = Taking  the  antilogaritlim  gives  the 


result.  From  the  standpoint  of  the  additivity  of  probabilities  for  exclusive 
events,  the  only  consistent  aggregation  function  is  the  weighted  average. 

From  the  standpoint  of  tlie  product  rule  for  joint  probabi  1 i . ies , tlie  only 
consistent  aggregation  function  is  the  welgiiti-d  product. 

Uliether  Theorem  2 is  to  be  considered  a strong  impossibility  theorem  for 
probability  aggregation  is  not  completely  clear  cut.  It  certainly  rejects 
normal  practice  in  applying  tlie  probability  calculus.  It  states  tliat,  even 
for  independent  events.  It  is  not  legitimate  to  obtain  group  estimates  tor  two 
events  separately  and  tlien  multiply  t(>ese  to  arrive  at  tlie  group  estimate  tor 
the  joint  occurrence.  On  the  other  hand,  it  could  be  contended  that  ttiere  is 
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nothing  in  Al-AJ  which  inplics  tiint  this  is  tho  way  in  wliich  proh. Wiil  it  ies 
for  joint  occurrences  arc  to  be,  obtained,  even  tor  individual  estir.-.tes.  Tims, 
Theorem  1 allows  the  proimdure  of  first  t)btaining  group  estimates  tor  all 
absolute  (non-relative)  i)robabi lities  on  an  event  space  U,  and  then  defining 
all  relative  probabilities  in  the  usual  way.  The  multiplication  rule  would 
then  hold  for  all  these  derived  probabilities. 

Although  this  procedure  is  logically  impeccable,  it  has  the  awkward 
feature  that  pairs  of  events  which  every  member  of  the  group  consider  inde- 
pendent, may  not  be  independent  in  the  group  probability  distribution  on  U. 

And,  of  course,  the  derived  relative  probabilities  will  not  be  equal  to  the 
aggregates  of  the  individual  relative  probabilities. 

This  appears  to  be  a case  where  the  Emerson  principle  may  override 
elementary  logic.  It  certainly  is  desirable  that  each  member  of  tlte  group 
make  consistent  probability  estimates  — otherwise  we  are  somewhat  at  sea  in 
evaluating  the  individual  estimates.  It  is  perhaps  even  more  desirable  that 
the  group  estimates  form  a consistent  set,  since  computations  will  be  made 
with  tliem,  and  if  they  are  not  consistent,  large  errors  can  arise  by  "com- 
pounding" the  Inconsistencies. 

Some  light  is  shed  on  the  issue  here  by  reverting  to  the  aggregation  of 
non-prebabi list ic  magnitudes.  Strictly  speaking,  the  analogue  of  Theorem  1 
I holds  for  any  additive  quantity,  such  as  length,  weight,  etc.  For  example, 

If  we  wish  to  obtain  the  combined  weight  of  a given  object,  e.g.,  the  weight 
of  an  envisaged  spaceship,  by  estimating  the  weights  of  the  components,  then 
the  analogue  of  Theorem  1 would  h< ’ d that  the  only  consistent  form  of  aggrega- 
tion for  individual  estimates  must  be  a weiglited  average.  So  far,  tlicre  is  no 
difficulty,  since  the  magnitude  weight  does  not  involve  anything  comparable 
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9 


to  the  multiplication  rule  for  probabilities.  However,  if  we  want  to  consider 
a multiplicative  aggregate  of  two  linear  quantities,  such  as  a performance 
criterion  consisting  of  the  product  of  speed  and  payload,  tlien  an  analogous 
difficulty  will  arise.  The  aggregate  of  the  product  will  not  be  the  product 
of  the  aggregates.  In  fact,  tliis  difficulty  will  hold  for  any  nonlinear 
combination  of  the  two  linear  quantities. 

It  seems  clear  that  choosing  an  aggregation  procedure  for  quantities 
whicli  are  subject  to  mathematical  operations  outside  the  scope  of  their  defini- 
tions requires  criteria  that  will  be  incompatible  with  simple  consistency. 


6.  Probabilistic  Aggregation 

In  the  preceding  section  we  saw  that  there  is  no  aggregation  function 
for  probabilities  that  is  consistent  with  a set  of  individual  probabilities. 
Armed  with  the  Emerson  principle,  we  do  not  have  to  remain  content  with  this 
result  — i.e.,  we  can  still  ask  whether  there  is  some  way  to  aggregate  a set 
of  probability  estimates  which  is  not  consistent  with  the  individual  estimates, 
but  which  performs  well.  This  is  a difficult  topic  to  deal  with  on  a general 
level,  since  many  of  the  more  interesting  results  depend  on  special  features 
of  particular  score  rules.  We  first  examine  some  results  wltli  averages,  wliich 
shows  that  the  simple  average  is  not  too  liad.  To  discuss  ttiese  results,  it 
is  useful  to  characterize  tlie  group  in  somewhat  more  detail  than  we  liave  done 
up  to  now.  For  most  of  this  section,  it  is  sufficient  to  think  of  the  group 
as  an  enterprise ; tiiat  is,  tiie  group  is  charactei  ized  by  a common  decision 
matrix  » tlie  utility  to  the  group  if  action  A^  is  taken  and  event  occurs, 
and  it  is  taken  for  granted  that  the  group  will  select  some  common  action  A^. 


There  is  no  problem  Involved  with  multiplication  by  a scalar  (non-dimensional 
number) . 
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Coils  the  ease  wliere  the  only  stoup 


Con^jive  scortiu;  rules, 
task  is  tu  es!  iniate  a prohahility  il  i st  r I bii  t ion  on  a:’  .-vent  space,  lad  wliore 
t lie  ^;roup  utilitv  can  be  represented  bv  a roiuave  score  rule  S(K,j).  A con- 
cave score  rule  is  one  for  whicli  S(aR  + (l-a)k')  > aS(R,j)  + (1-a) S (R' , j ) . 

•c-  2 

The  quadratic  score,  2R.  - V R,  , and  the  logarithmic  score,  a log  R.  + b, 

.1  X J 

are  exar.ples  of  concave  rules.  In  tiiis  case,  if  we  examine  the  objective 
expected  payoff  OES^  that  would  be  realized  by  following  the  advice  of 
individual  i,  we  have 

OES.  = Y.  (27) 

j ^ 

where,  as  usual,  P is  unknown.  The  weighted  average  of  these  expected 
payoffs  is 

Y.  a.  OES.  - Y Y P:S(R.,j)  (28) 

1 i j ■' 

= L ^ S(R  j) 

1 J 1 J- 

If  S(R,j)  is  concave,  we  have 

Y^i  OKS  P,K(R  j)  (29) 

i j ^ 

where 

R =Y  a.R. 

8 Y ^ ^ 

(29)  asserts  that  for  concave  score  rules,  the  average  expected 
score  is  always  less  than  or  at  most  equal  to  the  expected  score  of  the 
average  estimate.  This  is  a fairly  strong  result,  in  that  it  does  not 
depend  on  the  actual  probability,  and  is  true  for  any  concave  score  rule, 
and  any  set  of  weights  a^.  Thus,  for  informational  scores  like  the  logarith- 
mic and  the  quadratic  rules,  the  average  of  the  individual  estimates  will 
always  produce  a higher  expected  score  than  the  average  expected  score  of 
the  individuals. 
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The  results,  of  course,  specializes  immediately  to  the  non-weighted 


average 

-XloES.  = X)  P.(R,.)  (30) 

n 1 . 1 1 

I .1 

For  groups  whose  primary  output  is  a set  of  estimates  (e.g.,  consult- 
ing firms)  and  for  which  the  informational  scores  are  a reasonable  measure 
of  performance,  (29)  or  (30)  are  basic  aggregation  formulae. 

To  illustrate  this  result,  in  Figure  43  the  group  realism  curve  For 
the  average  of  the  individual  probability  estimates  derived  from  the  date  ol 
Capen  is  presented.  The  lower  solid  curve  is  the  individual  realism  curve 
from  Figure  9 Chapter  II.  The  upper  dashed  curve  is  the  relative  frequency 
of  correct  responses  plotted  against  R,  the  average  individual  estimate.  The 
difference  between  the  two  curves  is  dramatic.  Whereas  for  the  most  part, 
the  relative  frequency  correct  is  lower  than  the  estimated  probability  for 
the  individuals,  for  the  group,  if  R > .7,  the  group  is  "always  right." 

If  we  take  the  conventional  interpretation  of  the  individual  realism 
curve,  namely  that  individuals  "overestimate"  their  information,  then  Figure 
43  indicates  that  the  group,  defined  as  the  average  of  the  individual 
responses,  drastically  "underestimates"  its  information. 

The  average  individual  quadratic  score  for  the  Capen  data  is  .47.  The 
quadratic  score  for  the  "complete  ignorance"  estimate  R^  = .5  is  .5.  Hence 
the  average  score  for  the  individuals  is  worse  than  if  each  Individual  had 
answered  every  question  by  saying  "I  don't  know."  The  best  average  individ- 
ual score  was  .643.  The  average  score  for  the  group  response  was  .67.  For 
this  data,  the  group  score  Is  better  than  the  best  individual  score. 

Non-concave  sco^lnjj^^  rules.  For  enterprises  whose  score  rule  is 
not  concave,  the  n-heads  ruli-  must  be  weakened  somewhat.  Wi'  assumi-  that 
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the  group  decides  to  perform  action  A^.  There  is  no  loss  of  generality  in 

assuming  that  there  is  some  estimate  R for  the  probabilities  for  which  A is 

optimal.  Thus,  selecting  A is  equivalent  to  selecting  some  R as  the  group 

8 8 

estimate.  Individual  i,  then,  sees  the  expected  group  return  as 

= E EijS(Rg,j)  (31) 


The  weighted  average  of  these  expectations  is 

y a. EG.  = E a-  E R- -S(R  .1) 
<"’11  1 V IJ  g 


J 


= E S a.R  S(R  j) 
j i 


(32) 


where 


- S .1) 


= E From 


the  definition  of  a proper  score. 


(33) 


i .1 


(34) 


Thus,  the  average  weighted  expectations  of  the  members  of  the 
group  is  maximized  when  the  group  uses  as  its  estimate,  the  wei  'hted  average 
of  the  individual  estimates.  As  in  (30),  this  result  specializes  to  the 
non-welghted  average. 

1/n  E eg  < E F=E(R.J)  (^5) 

1 i 

where  in  this  case 

Rj  = 1/n  E E... 


Although  (34)  does  not  guarantee  that  the  objective  expectation  of  the 
group  is  maximized  when  the  group  acts  in  accordance  with  the  average  estiuai 
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expcctat  ionK  of  tlie  individuald.  Thus,  if  the  enterprise  is  an  eeoiionuc  unit, 
where  the  payofl  is  in  money,  and  the  \;eights  represent  a proportionate  snare 
of  the  return  of  the  enterprise  going  to  each  Individual,  then  average  esti- 
mate maximizes  the  average  proportionate  expectation. 

There  are  several  other  ways  to  express  what  is  essentially  the  same 
result  that  clarify  the  import  of  (34).  Suppose  we  examine  the  Monday  morn- 
ing quarterbacking  situation  wiiere  each  individual  is  paid  according  to  how 
the  enterprise  would  have  performed  if  it  had  folloi/ed  his  advice.  The  indi- 
vidual then  has  an  expectation  of 

E,  ■ 

Similarly,  he  has  an  expectation  of  the  return  for  individual  k of 

.1 

Thus  i's  expectation  of  the  return  to  the  entire  group  is 

1 

The  weighted  average  of  tliese  expectations  is 

IK  1 K j 

= Z a V)  R S(R.  ,j) 

k j J 

Wl.ich  again,  by  definition  of  a proper  score 

< Z Z R.S(R.j)  = Z R,S(R,))  (37) 

k J .)  J 
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In  this  disaggregated  case,  the  average  expectation  of  the  total  grouji 
return  is  maximized  if  each  individual  adopts  the  average  estimate.  For 
example,  it  the  group  consists  of  a loose  confederation  of  "independent" 
operators,  but  each  deal  with  the  same  basic  decision  situation,  and,  say, 
they  agree  to  pool  their  earnings  and  redivide  (e.g.,  a group  of  individuals 
betting  separately  on  the  same  set  of  sports  events,  but  pooling  their  earn- 
ings), then  the  average  expectation  of  the  group  return  is  maximized  if  all 
use  the  same  set  of  estimates  — namely  the  weighted  average  — to  make  their 
"individual"  decisions. 

A straightforward  corollary  of  (37)  obtains  if  we  reformulate  the  pay- 
off in  terms  of  regret.  Define  the  regret  of  individual  i as  the  difference 
between  what  he  thinks  the  group  can  obtain  using  his  estimate,  and  what  he 
thinks  will  obtain  using  the  group  estimate.  Then  the  weighted  average  esti- 
mate will  minimize  the  average  of  the  individuals'  regrets. 

This  series  of  decisional  n-heads  rules  shows  that  weighted  or  un- 
weighted averages  of  the  individual  estimates  do  well  compared  to  the  aver- 
age expected  performance  of  the  individuals.  To  do  much  better  than  this, 
it  is  necessary  to  take  into  account  some  additional  properties  of  the 
group. 

7.  The  Croup  as  an  Information  S^^^tem 

In  the  opening  section  of  this  chapter.  It  was  pointed  out  that,  in 
theory  at  least,  each  member  of  a group  can  be  conceived  as  possessing  a cer- 
tain stock  of  information,  I^,  and  a group  estimation  procedure  can  be  tliought 
of  as  a method  of  pooling  that  Information  to  arrive  at  a collective  answer  lo 
a question.  A simple  formal  representation  of  this  theory  is  to  assume  there 
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on  till'  I'vont  sot  {H.l  oivcii  Llio  vector  ot  individual  information  sots  I = 

1 

(I,....,l  ).  Thoro  is  no  difficuLty  in  assuraiiu’  that  tho  events  -a  ' - 1 are 
themselves  partitions  of  the  general  universe  of  discourse  U,  and  the  group 
information  set  1 Is  just  the  logical  product  of  the  individual  information 
sets. 

Although  this  model  is  formally  well  defined,  it  suffers  from  the  fact 

* 

that  the  I . are  not  observable. 

1 

One  potential  approach  is  to  treat  the  information  sets  as  "interven- 
ing variahles;"  l.e.,  to  posit  the  existence  of  additional  probability  func- 
tions P(R^|ij^)  which  relate  an  individual's  information  to  his  report,  and 
to  formulate  the  group  judgment,  expressed  say  as  P(E^|r.I),  in  terras  of  these 
probabilities.  If  carried  out  rigorously,  this  approach  becomes  quite  complex. 

It  turns  out  that  a theory  can  be  generated  which  bypasses  most  of  this 
complexity,  and  which  is  Isomorphic  to  the  theory  that  would  ensue  starting 
with  tlie  notion  of  information.  The  tlieory  is  generated  by  substituting  the 

observable  Item,  the  individual  report  R^,  for  1^,  and  the  group  report  R 
** 

for  I . 

It  might  be  worth  pointing  out  tliat  this  duality  between  information 
and  reports  (or  more  generally,  between  individual  estimates  and  items  of 
information)  is  more  widely  applicable  than  the  use  made  of  it  in  this  section. 


There  are  other  difficulties  in  practice  — mainly  in  trying  to  characterize 
a universe  of  discourse  that  establishes  a coherent  structure  for  tlie  mis- 
cellaneous material  evoked  by  asking  an  uncertain  question. 

The  resultant  formalism  is  similar  in  many  respects  to  signal  theory,  where 
the  individual  reports  Rj  are  treated  as  messages,  or  signals,  and  the  group 
Judgment  is  treated,  as  in  signal  theory,  In  terms  of  combining  data  from 
several  "channels." 
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For  example,  the  duality  cau  be  used  to  explore  the  value  of  aiigmentin>; 
individual  estimates  with  additional  information  "fed  in"  during  a group 
estimation  process. 

Most  of  the  results  of  this  section  are  of  theoretical,  rather  than 
practical  interest.  However,  they  have  some  implications  for  practice.  In 
particular,  they  establish  certain  "ideal"  results  which  can  be  used  to 
evaluate  the  effectiveness  of  practical  aggregation  techniques.  For  example, 
it  will  be  shown  that  in  theory  the  group  is  more  accurate  than  the  most 
accurate  member.  In  order  to  achieve  this  desirable  result,  it  is  generally 
necessary  to  know  more  about  the  group  than  is  possible  in  practice.  However, 
knowing  this  result,  there  is  reason  to  be  discontented  with  procedures  where 
the  group  does  much  more  poorly  than  the  most  accurate  member. 

Suppose  an  individual  announces  a report  R,  l.e. , he  says  "The  prob- 
ability of  event  E is  R."  We  can  treat  R as  a probability  judgment,  or  we  can 
treat  it  more  cavalierly  as  a simple  datum,  and  ask,  "If  individual  i says  R, 
what  is  the  probability  of  event  E?”  The  question  assumes  there  is  a prob- 
ability function  P(E|r)  which  relates  the  report  with  the  occurrences  of  E. 

Conceptually,  R is  not  necessarily  an  assertion.  It  could  consist  of  a nod 
of  the  head  or  a wave  of  the  hand.  However,  since  we  want  to  apply  the  theory 
to  the  aggregation  of  probability  statements,  we  will  assume  that  the  reports 
are  probability  statements. 

The  chief  freedom  allowed  by  treating  reports  as  signals  rather  than 
as  estimates  is  that  the  reports  do  not  necessarily  have  to  fulfill  the 
probability  postulates.  Thus,  initially  at  least,  we  do  not  run  into  the 
problems  associated  with  calibration.  Nor  do  we  have  to  worry  about  con- 
sistency. Of  course,  if  P(e|R)  were  not  roughly  monotonlc  in  R,  we  would 
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1 


think  th.)t  our  estimator  was  a hit  st  ran^te . Hut  as  w»-  havo  seen  in  the  dis- 
cussion of  ceuiiLer[  1 ad  I ct  Ion , for  some  questions  and  ana  indivldu.:  , r(ljR) 

Is,  as  a mutter  of  l.ict,  not  monotonfc  In  R. 

In  addition  to  the  prohability  function  I'(E|r),  we  assume  there  Is  a 
prior  probability  tliat  the  Individual  will  report  R.  This  prior  probability  is 
relative  to  the  question  being  asked,  which  we  will  assume  Is  simply  to  esti- 
mate the  probability  distribution  on  U,  witfi  a given  partition  E..  Thus,  there 
is  a response  space  — all  permissible  probability  distributions  on  U — and  a 
probability  distribution  P(R)  over  these  distributions.  We  can  interpret  P(R) 
as  in  the  theory  of  errors  as  arising  from  "random  error;"  that  is  to  say, 
there  is  some  average  R which  the  individual  "aims  at,"  but  chance  influences 
lead  him  to  say  something  else.  For  the  time  being,  we  will  assume  that 
P(e|r)  is  a function  of  the  total  report.  To  be  more  precise,  in  the  general 


case  the  Individual  report  consists  of  the  probability  assignment  R^j > where  i 
denotes  the  Individual  and  J denotes  the  event  Ej . Let  R^  stand  for  the  set 
i R^ , . . . ,R^^} . Then,  in  general  we  allow  the  possibility  that  the  probability 
for  a given  event,  P(E^lRj),  depends  on  the  entire  report  R^^.  Otherwise, 
we  would  run  Into  the  problem  of  calibration,  l.e.,  if  P(E^|r^)  = f(R£j)  t'len 
P(E^|Rj)  “ '^ij’  individual  Is  consistent  in  his  estimates. 

In  addition  to  the  response  space  R^  and  probability  function  P(E^|r^) 

for  Individuals,  we  assume  there  is  a joint  response  space  R « (R, R ) for 

1 n 

a set  of  n individuals  (the  group),  and  a joint  probability  function  P(Ej|r), 
which  expresses  the  probability  that  the  event  E^  will  occur  if  the  group  says 
R.  There  is  also  a joint  prior  probability  distribution  P(R)  on  the  group 
report.  In  the  spirit  of  the  theory  of  errors,  we  might  assume  that  the  joint 


distribution  P(R)  is  simply  the  product  of  the  individual  distributions 
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P(Rj^);  ;.e.,  P(R)  = |]  P(Rj^).  If  the  individual  distributions  are  assumed 

i 

to  be  the  result  of  "purely  random"  variations  on  the  part  of  the  individuals, 

this  assumption  would  be  reasonable.  However,  it  will  turn  out  that  tliis 

assumption  is  not  required  for  some  of  the  most  interesting  consequences  of 

the  tlieory,  so  it  will  be  postponed. 

In  some  general  sense,  the  function  1’(K.|r)  is  an  aggregation  function 

for  the  set  of  reports  R.  P(K.|r)  is  not  necesf.ar i ly  a function  of  the 

J 

reports  R alone.  in  particular,  it  may  reflect  interactive  effects  .imong 
tile  reports,  a topic  that  will  be  dealt  wit*:  later  in  this  section. 

To  complete  the  analysis,  we  assume  tliat  tiie  aggregation  problem 
arises  within  some  context  wiiicii  will  be  labeled  A (tor  a-priori),  in  which 
the  responses  R are  generated.  Rased  on  whatever  is  known  prior  to  the  res- 
ponses, there  is  an  a-priori  distribution  on  the  events  R which  could  be 

i 

denoted  by  P(Ey'A).  However,  since  the  context  A is  part  of  the  "total  scene," 
and  the  term  A would  apiiear  as  an  antecedent  in  all  probability  expressions, 
we  will  suppress  it.  Thus  for  P(Ej|a)  wc  will  write  simply  P(R.).  Rather 
than  P(Ej|r.A)  we  write  P(K^|r),  etc. 

Notice  that  the  situation  is  entirely  "objective."  Some  stimulus,  e.g,., 
asking  a question,  generates  the  responses  R.  Tfie  a-priori  prol>abi  1 it  ies 
I'(R.),  R(R)  and  P(E.),  as  well  as  the  a-postereori  probabilities  I’(K.|R.)  and 
P(E^|r)  are  assumed  to  be  properties  of  the  situation,  and  are  not  estimates. 

Of  course,  in  order  to  apply  the  analysis,  it  is  necessary  to  know  these 
probabilities  — but  that  is  a different  topic,  to  be  pursued  lu'low. 
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Within  this  1 I'.imework  wo  can  ask  a number  of  pertinent  questions: 

(O  How  Jo  r lie  individual  oxiiectotl  scores  compari-  with  the  a-priori  score? 
fJ)  How  does  the  Rtoup  score  t:ompare  witli  the  a-priori  score?  (1)  Ilow  does 
the  firoup  score  compare  wltli  any  individual  score?  (A)  Wliat  is  the  effect 
on  the  K^oup  score  of  adding  new  members  to  the  group? 

The  expected  score,  based  on  the  situation  prior  to  the  reports,  is 
just  P (E . ) S (P (E) , J ) , where  P(E)  represents  the  distribution  :P(E  )}.  We 

J ' ’ 

can  call  this  the  a-prlorl  score,  AS.  It  is  the  score  that  would  be  expected 
if  the  probabilities  P(E.)  were  taken  as  estimates. 

To  evaluate  the  individual  scores,  we  consider 


R t 1 1 


(38) 


Here  we  compute  the  total  expected  score,  summing  over  all  the  possible 
responses  of  individual  i.  This  could  be  called  the  before  the  fact  expecta- 
tion, i.e.,  It  is  the  expectation  before  a specific  has  been  announced. 
Each  term  E ^ ■ I Rj ) ^ (Rj^ > j ) could  be  called  an  after  the  fact  expected  score 
since  it  is  the  score  computed  after  the  individual  has  announced  . 

From  the  rule  of  elimination  (F3,  Chapter  II),  we  liave 


Substituting  this  in  the  expression  for  the  a-priori  score,  we  obtain 


= E R(R,)  E R(E.  |R.)  S(P(E),.j) 

R J ^ 


From  the  definition  of  a proper  score,  tlien. 
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AS<^  P(RJ  P(E.1r.)  S(P.,j)  (V)) 

K.  i ^ 

where  P.  is  shorthand  for  the  distribution  {P(E.|R,)t. 

>-  J i 

(19)  states  that  the  total  expected  score  for  the  estimate  fP(E  |r  )^ 

J 1 

Is  greater  than  the  expected  score  a-priori.  This  is  true  before  the  fact  — 

i.e.,  before  the  report  R^  is  announced.  At  first  glance  it  might  seem 

peculiar  that  the  expected  individual  score  is  greater  than  the  a-priori 

score,  since,  on  one  interpretation,  tlie  various  reports  R^^  arise  at  r<mdt)m. 

The  significant  feature  of  the  model,  however,  is  that  P(E.|r^)  is  a t uni  t ion 

of  R^.  Thus,  once  R,  has  been  announced,  tlie  probability  of  Ej  changes. 

(19)  answers  one  of  the  questions  raised  earlier.  The  expected  score 

for  eacli  individual  based  on  the  probability  distributions  P(E.|r.)  is  greater 

than  the  a-priori  score.  Notice  this  is  not  tlie  same  tiling  as  the  expected 

score  based  directly  on  tlie  reports.  In  general  it  will  not  be  the  case  tliat 

Y.  Z P(E.|R.)  S(E.,i)  = Z P(R.)Z  |R->  S(R  ,i).  And  of  course 

Ri  i ^ " R^  ' j ^ ' 

it  is  quite  different  from  any  subjective  expectations  the  individual  might 
have.  The  equality  would  occur  only  when  the  individual  is  completely  real- 
istic, i.e.,  when  P(E.|r.)  = R... 

Exactly  the  same  line  of  reasoning  that  led  to  (19)  can  be  used  to 
demonstrate  that 

AS<Z  P(R)  Z l’(E  |R)S(P,j)  (40) 

R i 

where  P is  shorthand  for  the  il istr ibut ion  {P(Ej|r)}.  That  is,  the  average 
expected  score  for  the  group  is  always  greater  than  or  equal  to  the  a-priori 
score,  when  the  probability  distribution  P(Ej|r)  is  assumed  to  be  tlu-  reiiort . 
Of  course,  the  same  comments  concerning  before  the  fact  and  after  the  fact 


hold  lor  the  group  report  as  wore  made  for  the  individual  reports. 
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rill'  third  qui'st  inn  liow  ilot's  I hi'  ftrouii  si'nri'  I'ninparn  willi  iiiv  i mi  i v i dna  1 


si'ora?  can  i'c  answiTi’d  hv  a similar  lint'  of  rnasmi  i nc, , imt  with  sonH'  add  L- 
tionil  lint  It  inn.  I, ft  K , ri'pri'Sfnt  t lir  vector  of  u- 1 reports  I'miltiii)’  K.. 
Kri'm  the  rule  oi  elimination  we  liave 

'’O'.  .R)  with  K.  fixed 
' ‘ R ’ 

- I 

= Z)  p(k,|r)p(r) 

R . J 

- I 

and  dividing  both  sides  by  P(R.) 

1’(R.1r.)  = Z)  I'OI’CR  , |R.\  (41) 

.1  ‘ R 1 -i'  1 1 

-1 

Substituting,  from  (41)  in  the  right  hatid  side  of  (40),  we  have 

Z Z I’(kJr)  S(P  ,i)  (42) 

R ’ i ^ 

Rearranging,  and  noting  that  I' (R^^) R (R_j^  I R)  = l’(R),  we  have 

Z *’('<)  Z R(K.  i‘<)S(R.,J)<  Z >’(R)  Z R(E.  |R)S(P,i)  (43) 

R i ^ R i ' 

In  short,  till'  averag.e  expected  score  for  the  group  is  always  greater  th;in  or 
equal  to  the  average  expected  score  of  any  member  of  the  group. 

fhe  logic  of  this  demonstration  is  actually  the  same  as  used  to  show 
that  either  the  itidividual  or  the  group  has  a higher  averaged  expected  score 
than  the  a-priori  score.  The  estimates  of  the  additional  members  of  the 
group  act  as  a refinement  of  the  estimate  of  any  member  of  the  group,  and 
tience  tlie  average  expected  score  of  the  total  group  is  greater  than  that  of 
anv  member. 

The  same  sequence  of  steps  can  be  Inverted  to  show  that  the  addlt\)iial 
of  a new  member  to  a group  never  reduces  the  average  expected  score. 
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In  a previous  publication  I stated  that  proliahi  1 isL  ic  apu'.re^’at  ioti  is 
risky,  in  the  sense  that  tlie  expected  score  ol  the  ^toup  can  in-  »,tlier  greater 
than  or  less  than  the  a-priori  score. ^ That  statement  was  based  on  an 
analvsls  of  after  the  fact  scores,  and  was  correct  for  tbit  c.ise.  However,  af. 
the  above  derivation  sliows,  the  average  expected  score  before  the  1 .i.  t is  not 
risky,  in  the  sense  that  it  is  alvtays  greater  than  or  equal  to  the  ,i-priori 
score.  Similar  comments  hold  with  regard  to  tlie  comparison  betwi'eii  the  group 
and  individuals. 

This  sequence  of  results  pets  in  precise  form  a number  of  "ol)vious"  fea- 
tures of  group  judgment.  Since  the  group  encompasses  at  least  as  mucli  informa- 
tion as  any  member,  theoretically,  it  should  do  at  least  as  well  as  the  liest 
member.  Similarly,  if  any  new  member  is  added,  he  cannot  detract  from  tlie 
information  already  available  to  the  group,  and  lienee  should  not  be  counter- 
product i ve. 

However,  these  statements  hold  only  for  the  objective  probab  i 1 i t ii'  . 

and  1’(1:.|k).  Thev  do  not  hold  for  the  estimates  R.  and  anv  particular 
1 I 1 1 

aggregation  of  R.  In  order  to  capitalize  on  (43),  for  example,  it  is  necessarv 

to  know  the  function  I’(K.|r).  by  and  large,  it  is  not  possible  to  compute 

|k),  even  if  the  i’(K,|R^)  are  known.  It  is  even  less  feasible  to  acimpute 

R(E,|k)  ii  onlv  the  R.  are  known. 

J ■ t 

To  explore  this  a little  further,  we  can  "unpack"  R(b^lR)  in  terms  of 
the  P(E^|Rj)  and  some  related  [irobab  i 1 i t ies . 

For  the  moment,  we  will  drop  the  subscript  on  to  streamline  the 
notation.  It  will  reappear  when  we  reevaluate  the  expected  group  scon-. 

We  ilefine  two  auxllllary  notations 


= l’(K)/  II  '’(«,) 

K j i 
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1)  nuMsiiri.‘s  tlie  ciogrt'O  iit  dcpfiideiu-y  among  tlie  individual  reports  R,.  I) 

K 1 K 


mi'an,.  th.iL  the  reports  are  indepi'ndent ; the  irrohal' i 1 it  y of  a given  i on  j uin  t i i 
is  tire  (>ro(luet  of  tlie  individual  proliab  i 1 i t ies 


= 1'(K|K)//7'’(Rj  ll-) 
i 

l)J,'  mtM.nires  the  event  related  deiiendruicy  among  the  re|iorts.  1)^,  = 1 mcsin.s  that, 
assuming  the  event  K occurs,  the  probability  of  the  joint  report  R is  jtist 
the  product  of  the  probabilities  of  the  individual  reports. 

Starting  with  tlie  rule  of  the  product 

i>(i;|r)  = p(r|f:)p(k)/p(r) 

and  substituting  for  P(R)  from  (44)  and  for  P(R|e)  from  (45) 


I’(K|R)  = dJJ 


|Jp(rJk)p(e) 


(46) 


f inal Iv, 
have 


invoking  tlie  rule  of  the  product  for  the  individual  reports,  we 


p(e1r)  = 


l>. 


(4  7) 


(47)  displays  the  probability  of  Interest,  namely  P(E|R),  as  a function  of 
the  individual  probabilities  P(E|R^),  the  a-priori  probability  P(E),  and  the 
two  dependency  terras. 

From  our  previous  discussions  of  the  distribution  P(R)  it  seems  to  be  a 

reasonable  assumption  that  D„  = 1,  especially  for  a group  process  in  which  the 

K 

individual  reports  are  collected  separately  and  anonvmously.  Tliere  is  no 

E 

simple  way  that  I know  to  assess  the  event  related  dependency 

Since  (47)  contains  the  product  n P(e|r  ),  it  is  convenient  to  use 

1 ^ 

the  log  score  to  assess  the  group  performance. 
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Reinstating  tlie  subscrij  ^s  for  the  events,  we  can  compute  the  average 

expected  log  score  AES  for  (47).  Setting  D = I for  all  R wo  have 

R R 


AES 

g 


»!j'i 

X;  I’(K)53|'(E  |R)log  ' --  f ' 

R i ' R(E.)" 

1 


(4K; 


Since  the  terms  do  not  involve  R,  expansion  of  (4K)  gives 

AES  = -(n-1)  AS  + XI  f’(R)  X t’(K.|R)  X logP  (E . | R . ) + D. 

® R j ^ i ’ ' (49) 

AS  is  the  a-priori  score  defined  earlier.  I)  is  the  average  expectation  of 

E. 

the  event  related  dependency  Applying  the  same  expansion  as  (42)  to 

the  central  expression  ir.  (49),  we  arrive  at 


AES  = -(n-l)AS  eX  X R(R.)X  I’(E.  R.)logI>(E.  R.)  + 1). 
g ; H 1 Y ' ^ 1 ' 

‘ ^ ■'  (40) 


Calling  the  average  expectation  of  individual  i,  AES., 


AES  = -(n-l)AS  +X  AES.  + 1).  (91) 

Finally,  it  is  convenient  to  introduce  the  notion  of  net  score,  namely  the 

difference  between  the  expected  score  of  an  individual  or  the  group,  and  the 

a-priori  score.  The  net  score  measures  the  improvement  (or  loss)  due  to 

employing  the  estimate  rather  than  simply  asserting  the  a-priori  probabilities. 

Thus  the  net  score  of  the  group  NAES  = AES  - AS,  and  the  net  score  of  <in 

g g 

individual  i is  NAES,  = AES.  - AS. 

i 1 

F rom  (51), 

NAES^  = XnAESj  + D (52) 


The  net  score  of  the  group  is  precisely  the  sum  of  the  net  scores  of  tlie 
individuals  plus  the  expected  dependency  term. 


2A8 


Tlio  iii't  ('xpcHtcd  f^i'ore  NAKS . ot  cai-li  imlividuai  is  positivi  . Ilcnci  , 

'■iAl'S  . i larr.iT  t li.iii  I he  iu“l  I'xpcclcd  scoii-  nl  anv  individual.  lli'Wi-V(  i,  ' 
i ' 

is  iiol  nc'ccssar  i L V positive’.  Wo  know  I roiii  (40)  tlial  iJAKS  is  pos  i t i v.  , lint 

it  need  not  be  as  i.irge  as  ^ N'AliS . . 

i 

8 . Approx Imat  tuns 

In  practice.  It  is  rare  that  enough  is  known  to  apply  formulas  like 

£ 

(43)  or  (52).  In  particular,  the  event-related  dependence  D j is  difficult  to 
express  in  terms  of  data  that  is  likely  to  be  available,  and  is  "non-intuit ive" 
when  it  comes  to  making  a judgmental  estimate.  But  in  addition,  the  a-priori 
probabilities  i’(E.)  are  usually  poorly  known,  as  are  the  individual  probabilities 
I’(E,|Rj^).  Often,  about  all  that  can  be  said  concerning  the  individual  proba- 
bilities Is  something  like  "i  is  a good  man  in  his  field;"  which  is  a long 
way  from  determining  the  probability  that  a given  event  will  occur  if  i says 

R.  . 

1 

As  we  saw  in  the  preceding  chapter,  a common  approach  given  such  a 

dearth  of  information,  is  to  rely  on  some  "plausible"  or  nominal  assumptions. 

The  assumption  D = 1 is  plausible,  based  on  the  assumption  that  much  of  the 
R 

variation  in  an  individual's  report  is  due  to  "random"  Influences.  The 
assumption  is  reinforced  if  the  responses  of  the  individuals  are  anonymous, 
and  hence  presumably  independent.  This  particular  assumption  can  be  side- 
stepped; it  is  possible  to  reformulate  (47)  in  a way  that  does  not  involve 
Since  ^ H(E  |R)  = 1,  we  can  divide  (47)  by  X>P(E.(R)  to  obtain 


P(E, IR) 


i 


1'- 


(53) 


where 
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'V 

E. 

is  a kind  of  relative  dependency  term,  no  easier  to  estimate  than 

all  events,  (53)  does  not  contain  the  D term. 

K 

Tfie  traditional  assumption  of  equal  a-priori  probabilities  would  seem 

to  have  as  much  justification  in  group  estimation  as  it  does  in  more  conven- 

E. 

tional  statistical  inference.  An  additional  tempting  assumption  is  D ' - I. 

E. 

Coupled  with  the  assumption  of  equal  a-priori  probabilities,  D * = 1 implies 


n p(EjiKi) 

P(E.|R) 


Finally,  if  we  assume  that  all  the  individuals  are  realistic,  i.e.,  P(E.'R^)  = 


Rjj,  (54)  becomes 


P(E. |R)  = 


(54)  and  (55)  are  pleasantly  simple  formulae.  If  the  assumptions 

leading  to  them  could  be  justified,  the  aggregation  problem  would  be  well  in 

hand.  There  are  reasons  for  thinking  (54)  and  (55)  are  oversimplified. 

For  the  two  event  ca.se  — i.e.,  the  case  where  the  event  space  consists 

E, 

of  an  event  E and  its  complement  E — the  joint  assumption  D = D * - 1 leads 

K K 

I * 

to  the  consequence  that  P(E|R^)  = P(E)  for  all  R^  but  one.  In  short,  for  all 
but  one  of  the  respondents,  their  estimates  add  no  new  Information  beyond  the 
a priori  information. 

A related  difficulty  is  that  for  large  groups,  assuming  (55)  implies 
that  almost  all  group  estimates  will  be  essentially  0 or  1.  Thus  lor  .i 
group  with  thirty  members.  If  the  average  response  is  .55  or  gre.iler  for  one 


The  demonstration  is  given  in  Appendix  111. 
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altarnative,  P(F|r)  j .998;  i£  the  average  response  is  greater  than  .6, 
r(8|R^  ■ .99999.  Since  it:  seems  unlikely  that  for  questions  of  the  sort 

where  group  estimation  is  appropriate,  the  group  knows  enougii  to  justify 
estimates  of  0 or  1 for  almost  all  questions,  the  independence  assumptions 
are  probably  Loo  optimistic  for  large  groups. 

A potential  compromise  is  tlie  geometric  mean.  Tlte  geometric  mean 
ret.iins  tlie  multiplicative  character  of  (48),  but  is  less  "extreme"  than  tlie 
normalized  product.  It  is  also,  in  a way,  a compromise  witli  respect  to  the 
impossibility  result  derived  in  Section  5.  There  it  was  shown  that  the  only 
function  of  a set  of  probabilities  which  is  multiplicative  is  a weighted 
product  with  weights  adding  up  to  1.  However,  a weighted  product  would  not 
add  up  to  1 for  exclusive  events.  Normalizing  a weighted  product  to  make 
it  sum  to  1 produces  a generalization  of  the  normalized  geometric  mean;  tiie 
normalized  geometric  mean  is,  in  fact,  the  normalized  weighted  product  with 


comes  from  the  likelihood  tiiat  tlie  prior  distribution  of  responses  P(R^)  is 
skewed  due  to  the  constraint  that  R^  is  between  0 and  1.  A glance  at  Fig.  18 


The  product  formulation  lias  a related  "edge  effect".  If  one  of  the  responses 
is  zero  for  a given  alternative,  then  the  group  response  is  zero,  independ- 
ently of  the  other  responses.  if  one  individual  reports  zero  for  one  alterna- 
tive, and  some  other  individual  reports  zero  for  its  complement,  then  the 
product  approximation  is  completely  degenerate;  both  probabilities  are  zero. 
This  edge  effect  can  be  dealt  with  by  a suitable  truncation;  however,  the 
results  will  be  highly  sensitive  to  the  nature  of  the  truncation,  especially 
for  large  groups. 
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Chapter  11  illustrates  this  point.  If  the  prior  distributions  are  roughly 
log  normal,  then  tlie  theory  of  errors  would  suggest  that  tlie  geometric  mean 
is  an  appropriate  aggregation  function. 


The  geometric  mean  Ikis  a particularly  straight  forward  n-heads  rule 
using  the  logarithmic  score  as  a figure  of  merit.  We  have 

i/n 


OES  - Ep.  loe 


l/n 


i i 

where  C = - 


(57) 


Since  C is  not  a function  ot 


It  Ls  a 


constant,  depending  only  on  the  Rearranging  (57),  we  have 

OES  = l/n  V OES  . + C 

g Y ^ 


(58) 


In  words,  the  objective  expected  score  of  the  geometric  mean  is  the  average 
of  the  individual  objective  scores  plus  a constant  term.  Since^  /7R-c 

kl.  K ~ 

1 

its  logaritiim  is  negative,  and  C is  positive.  Thus  the  advantage  of  the 

group  score  over  the  average  of  the  individual  scores  is  independent  of  the 

objective  probability  and  depends  only  on  the  amount  of  disagt  cement  within 
* 


the  group. 


It  can  be  computed  immediately  knowing  the  Rjj- 


C = 0 if  and 


only  If  all  the  Individual  reports  are  the  same. 

In  Figure  44  a subset  of  the  Capen  data  is  plotted,  comparing  the 
performance  of  the  mean  and  the  performance  of  the  geometric  mean  for  18 


★ 

This  feature  is  manifested  even  more  clearly  for  the  comparable  n-heads  rule 
for  the  quadratic  score  and  the  mean  as  the  aggregation  function.  In  this 

case  OES  = l/n  OES 
g 1 

reports  for  event  j. 


E2  2 

S.,  where  S.  is  the  variance  of  the  individual 
- i j j 
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PROPORTION  CORRECT 


m Qj 


ESTIMATE 


Figure  44.  Individual  and  Group  Calibration  n 18,  120  Questions 
(Data  from  Capen) 
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subjects  on  the  120  questions.  Two  features  of  the  realism  curve  for  the 
geometric  mean  are  notewiirthy:  a)  The  curve  Is  closer  to  the  45°  (fully 

realistic)  line  than  the  corresponding  curve  for  the  mean.  b)  The  group 
estimates  have  been  displaced  upward  — i.e.,  toward  higher  estimates.  Notice 
that  for  the  .85  and  .95  estimates,  there  were  13  cases  for  the  mean  and  41 
cases  for  the  geometric  mean.  The  average  quadratic  score  for  the  mean  is 
.646;  for  the  geometric  mean  it  is  .704.  For  this  particular  set  of  data, 
then,  the  geometric  mean  performs  much  better  than  the  mean,  and  decideiy 
better  than  the  best  member  of  the  group  (average  score  .63). 

The  advantage  of  the  geometric  mean  over  the  mean  in  this  set  of  data 
results  from  the  fact  that  the  realism  curve  for  the  mean  lies  above  the  45° 
line.  The  geometric  mean  generates  an  estimate  that  is  more  extreme  than  the 
mean  — i.e.,  it  is  closer  to  0 or  1.  If  the  realism  curve  for  the  mean  had 
been  below  the  45°  line,  the  geometric  mean  would  have  performed  more  poorly 
than  the  mean. 

One  set  of  data  is  hardly  a sufficient  basis  for  any  tirm  conclusions. 
About  the  most  that  can  be  said  at  present  is  that  the  geometric  mean  is  .i 
rough  approximation  to  the  "ideal"  aggregation  formula  (53)  and  that  it  gives 
surprisingly  good  results  for  the  one  case  investigated. 


The  selection  of  a subset  of  18  subjects  for  this  analysis  was  accidental. 
The  18  subjects  were  members  of  a UCLA  F.xecut  Ive  program  for  mid-career 
engineers. 
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CILXPTICR  VI  . GROUP  VALUES 


I.  Individual  Uliliti^ 

The  aggregation  of  value  judgments  presents  an  entirely  diflcrent  con- 
<-eptual  problem  from  tlie  aggregation  of  factual  judgments.  For  factual  judg- 
ments we  have  the  simplifying  feature  that  figures  of  merit  are  the  same  for 
individual  estimates  as  they  are  for  group  estimates.  Thus,  even  in  the  face 
of  logical  difficulties,  such  as  possible  inconsistencies  between  individual 
und  group  (estimates,  it  seems  reasonable  to  pr<3fer  a group  judgment  over  an 
individual  judgment  if  tlie  former  is  likely  to  be  more  accurate.  In  the 
present  state  of  the  art  tiiere  is  no  comparable  criterion  for  group  value 
judgements.  Tlie  question  whether  it  is  meaningful  to  speak,  of  figures  of 
merit  for  individual  value  judgements  is  still  somewhat  controversial.  But 
oven  it  the  notion  of  figure  of  merit  for  Individual  value  judgments  was 
sharply  defined,  the  same  figure  of  merit  would  not  apply  to  group  judgments, 
except  for  somi-  specialized  cases  such  as  the  fixed  share  partnership. 

The  major  emphasis  in  this  chapter  is  on  group  values;  but  some  atten- 
tion must  be  paid  to  individual  values  as  inputs  to  group  decisions.  Most  of 
I lie  conceptual  appar.itus  needed  has  already  been  presented  in  Chapter  II  in 
I he  thc'ory  of  probability  estimates.  In  fact,  it  is  only  a small  step  from 
postulates  Fl-1’8  of  that  theory  to  the  theory  of  individual  numerical  value, 
usually  called  the  theory  of  utility. 

A major  stumbling  block  in  the  theory  of  individual  values  is  the  lack 
of  a well-defined  figure  of  merit.  Many  decision  theorists  either  implicitly 
or  explicitly  adopt  the  view  that  a figure  of  merit  can  be  based  on  the  tie 
between  estimates  of  value  and  choice  behavior.  Thus,  an  estimate  of  the 
form  "A  is  better  than  B"  is  considered  correct  (for  a given  Individual)  if, 
when  presented  with  a free  choice  between  A and  B,  the  individual  selects  A 
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rather  than  B.  This  approach  has  led  many  economists  to  insist  that  the  only 
"true"  meaning  to  value  statements  is  the  correlative  choice  behavior.  Hence, 
individuals  should  not  be  asked  what  their  preferences  are,  but  rather,  the 
preferences  should  be  deduced  from  their  choices.  One  variant  of  this  attitiid 
is  the  doctrine  of  revealed  preferences — individuals  express  tiieir  value  judg- 
ments most  directly  in  their  market  behavior. 

Several  things  are  scrambled  in  the  revealed  preference  approach.  As 
with  all  estimates,  assuming  there  is  a figure  of  merit,  judgments  concern. ng 
preferences  are  subject  to  error.  Hence,  statements  of  preference  should  be 
treated  with  the  same  caution  as  any  other  kind  of  estimate.  On  the  other 
hand,  choices  such  as  market  beliavior  are  complex  phenomena  with  cognitive 
elements  playing  a role.  If  a ithoice  reflects  not  only  "pure  preference"  but 
other  types  of  estimates,  such  as  estimates  of  probabilities,  then  a choice 
can  be  "mistaken"  if  the  ancillary  estimates  are  incorrect.  Hence,  it  is  not 
clear  that  choice  behavior,  especially  nuirket  beltavior,  is  always  a reliable- 
source  of  figures  of  merit  for  value  judgments. 

Despite  these  caveats,  the  notion  that  choice  behavior  is  the  "proper" 
objective  correlate  of  value  judgments  has  a great  deal  to  recommend  it. 

Otlier  attempts  to  define  a corri;late— e.  g.  , internal  states  of  the  individ- 
ual or  feeling  tones — have  not  reached  a level  of  precision  that  would  allow 

★ 

useful  figures  of  merit.  In  tlte  following,  then,  it  will  be  assumed  that  cho 
is  the  most  useful  concommltant  of  preferences  for  decision  theory. 


A 

It  is  possible  that  a quite  dllferent  mechanism,  namely  the  phenometton  ot 
reinforcement,  could  furnish  a more  diagnostic  approach  for  objectifying 
value  judgments.  A characterl-ttic  of  a situation  which  reinforces  behavior 
(increases  the  probability  of  .in  associated  act)  might  be  considered  a value 
in  that  situation.  However,  so  far  as  I know,  reinforcement  has  not  as  yet 
been  used  as  the  basis  for  a theory  of  decision. 


To  rocap  i tu 1 a t o some  of  the  material  in  Chapter  !I,  we  assume  there  Is 
a set  X of  sitiiafinns,  amonp,  wnith  are  eont  inp.enr  ies  of  the  form 
and  the  Individual  has  a complete  preference  relation  over  X.  The  preference 
relation  obeys  the  principles  of  '■Ipi’lijl'iriilt' , -‘^><1  sure-th  i ng  for 

coat InBencles . Tliere  Is  at  least  one  event  with  probability  1/2,  independent 
on  repetition,  and  the  set  ol  events  generated  bv  repetitions  of  this  event 
are  ar  di  imedean . These  assumptions  lead  to  the  consequence  that  there  is  a 
numerical  scale  of  probabilities  on  the  events  that  Is  additive  for  exclusive 
event  s . 

For  the  purpose  of  Introducing  numerical  utilities.  It  Is  convenient  to 
modify  the  archlmedian  axiom,  I’S,  to 

I’H'.  If  X > y > z then  there  is  an  event  E such  that 
y ~ (x,z I E) 

P8'  states  that  If  y is  Intermediate  in  preference  between  x and  z,  then 
there  is  some  contingency  Involving  only  x and  z which  is  equivalent  to  it. 
This  axiom  is  usually  stated  in  the  form:  given  the  hypothesis,  there  is 

some  probability  p,  such  that  the  contingency  )t  with  probability  p,  z with 
probabl 1 Ity  1-p  is  equivalent  to  y.  What  is  usually  loft  unstated  in  the 
axiom  in  this  form  Is  that  the  probability  p Is  generated  by  a random  device, 
independent  on  repltltion.  Armed  with  Theorem  7,  Chapter  II,  it  is  not 
necessary  to  deal  with  contingencies  of  this  restricted  form. 

P8'  generates  a mapping  i)(x)  of  the  set  X onto  the  real  numbers.  This 
is  accomplished  as  follows:  Choose  any  pair  of  situations  x and  y,  where 

X > y.  Set  U(x)  =•  1,  U(y)  » 0.  For  any  z,  if  x > z > y,  U(z)  = P(E),  where 

•k 

With  a slight  raodif  icat  ion  of  the  combining  axioms  1‘3,  P8  would  do  as  well 
as  P8  for  completing  the  numerical  theory  of  probability;  however,  the  deflnl 
tlon  of  numerical  probabilities  would  be  somewhat  more  complicated. 
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z ~ (x,y|K).  If  z > X,  then  U(z)  = 1/P(E)  where  x = (z,y|K).  Flnalfy,  if 
X > y > z,  U(z)  = I’(E)  / (l’(K)-l  ) , where  y = (x;z|e). 

It  is  straightforward,  but  tedious,  to  prove  the  following  theorem: 

THEOREM  1.  (Von  Neumann-Morgenstern) : ^ The  mapping  U(x)  has  the  properties: 

1.  X > y if  and  only  if  U(x)  > U(y) 

2.  U((x,y|E))  = P(E)U(x)  + (l-P(E))  U(y) 

3.  if  z w,  then  the  mapping  U'(x)  based  on  z and  w is  related  to  the 
mapping  based  on  )i>  y.  hy  U'(x)  = aU(x)  + b,  where  a is  a positive 
constant . 

The  utility  scale  characterized  by  Theorem  I completes  an  intellectually 
satisfying  theory  of  individual  decisions  for  contingencies.  Furthermore,  it 
defines  an  implementable  procedure  for  establishing  utility  measurements, 
namely,  the  mapping  procedure  outlined  above.  The  f)rocess  can  be  tedious  it 
the  probability  is  determined  by  a sequence  of  successive  approximat iotis . 
However,  the  process  is  relatively  rapid  if  direct  estimation  of  the  probabili- 
ties is  employed. 

Probabilistic  scaling  is  not  the  only  way  to  establisli  a numerical  scale 
for  preferences.  As  Suppes  and  others  have  shown,  if  the  individual  can 
compare  in  a consistent  fashion  the  differences  in  value  betwefii  pairs  of 
objects,  scales  can  be  generated  that  are  also  determined  up  to  a linear  trans- 
formation, and  that  need  not  be  the  same  as  the  scales  established  by  Theorem  1. 
However,  if  the  scales  established  by  comparing  differences  are  not  the  same 
as  those  found  by  comparing  contingencies,  then  the  former  will  not  be  Linear 
in  probabilities,  l.e.,  property  2 in  Iheorem  1 will  not  hold.  These  comment 
have  no  bearing  on  which  type  of  scale  is  "right."  N-'vertheless , the  usefulness 
of  scales  which  are  linear  in  probabilities  is  so  overwhelming  that  it  would 
require  a dramatic  solution  to  some  basic  problem  to  make  the  pursuit,  of  other 
forms  of  .scaling  of  more  than  academic  interest. 
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(Ini'  sc'.ile  whuli  is  cfitainly  of  more'  than  academic  interrst  is  raonoy. 
Niinuroiis  t ht-ori'l  ical  discussions,  and  some  frapirical  Invi'st  i gat  ion  , iiavo  made 
It  plausiblf  tiiat  the  utility  of  money  is  nor  linear  in  probabilities. 
Assuming  tiiat  the  utility  of  money  is  concave  "explains"  many  types  of  risk, 
aversive  choices,  as  well  as  the  purchase  of  insurance  at  a premium  tiiat  is 
actuarily  excessive  (i.e.,  the.  expected  value  of  tlie  insurance  contract  in 
money  is  smaller  tlian  the  cost.)  Other  assumptions  about  tlie  shape  of  the 
utility  for  money  curve  can  "explain"  gambling  behavior,^  and,  as  we  saw 
in  Cliapter  IV,  can  resolve  certain  paradoxical  choice  phenomena  such  as  the 
Allais  puzzle.  Some  decision  analysts  appear  to  interpret  these  results  as 
implying  that  tlie  value  of  money  is  not  "really"  linear,  i.e.,  that  the  value 
of  money  in  some  undefined  absolute  sense  exhibits  "decreasing  returns  to 
scale."  "$1,000  is  worth  less  to  a millionaire  than  it  is  to  a pauper." 
Statements  of  this  sort  have  little  or  no  meaning  until  the  measure  of  worth 
is  specified.  The  statement  about  the  millionaire  and  the  pauper  appears  to 
be  true,  if  worth  is  measured  by  utilities  established  by  probability 
scaling  (i.e.,  by  choices  among  contingencies.)  However,  the  statement  is 
false  if  worth  is  measured  by  what  the  money  will  buy.  A pauper  can  buy  no 
more  shares  of  a given  stock  with  $1,000  than  the  millionaire.  Money  is  not 
linear  in  probabilities,  but  it  is  linear  in  many  significant  commodities, 

1 using  exchange  as  the  measuring  process. 

There  is  another  point  about  money  that  is  directly  relevant  to  the  issue 
of  group  values,  namely  money  has  a kind  of  Intersubjectivity  that  individual 
utility  does  not  possess.  Over  a wide  range  of  transactions,  money  has  the 
same  exchange  value  for  all  members  of  society,  and  for  groups  as  well  as 
individuals.  Thus,  we  have  a model  for  a value  scale  which  is  equally  valid 
for  groups  and  individuals. 
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The  hypothetical  Individual  for  which  the  decision  postulates  P1-P8'  hold 
is  not  identified  by  the  postulates.  The  consequences — the  existence  of 
probability  and  utility  scales — hold  for  any  entity  that  fulfills  P1-P8'. 

Thus,  if  a group  of  individuals  collectively  can  be  said  to  have  a complete 
order  of  pe.rference  over  some  class  of  situations  X,  and  the  other  postulates 
hold,  then  that  group  has  a collective  probability  scale  and  a collective 
utility  scale.  Since  it  is  clear  that  groups  do  make  decisions  (i.e.,  exhibit 
clioice  behavior),  there  is  no  a priori  reason  why  P1-P8'  should  not  cliaract  erize 
these  choices.  The  problem  in  formulating  group  value  scales  is  not  P1-P8', 
but  rather  the  relationship  between  group  and  individual  choice.  This  is  the 
subject  for  the  next  section. 

2 . The  Arrow  impossibility  Theorem 

In  the  Introduction  I described  a class  of  paradoxes  which  can  arise  when 
an  attempt  is  made  to  define  an  aggregation  function  for  a set  of  individual 
judgments,  where  that  aggregation  is  intended  to  be  consistent  in  some  way  with 
the  individual  judgments.  Probably  the  most  significant  instance  of  this 
type  of  dlfiiculty  is  the  theorem  due  to  Kenneth  Arrow  that  there  is  no  group 
preference  function  that  fulfills  a set  of  desirable  and  apparently  innocuous 
conditions . 

This  theorem  has  played  a major  role  in  recent  Invest i .t ions  in  welfare 
economics  and  decision  theory.  On  the  one  hand,  it  has  motivated  a large 
activity  concerned  with  "resolving"  the  problem,  and  on  the  other  hand  it  has 
acted  as  a restraint  on  the  development  of  tecluiiques  for  generating  group 
preference  functions  In  many  areas  such  as  voting  methods,  social  value  scalci, 
and  group  decision  procedures. 

In  the  following  sections  I present  a resolution  of  the  Arrow  theorem  in 
what  appears  to  be  a reasonable  sense  ol  that  term.  Since  the  theorem  is 
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coril'ct  , Lhore  is  no  resolut  ion  in  a strict  sense;  however,  if  it  can  be  shown 


tha:  the  conditions  assumed  by  the  theorem  are,  in  fact,  more  severe  than  one 

would  want  to  accept,  and  if  it  can  be  stiown  titat  a minor  relaxation  of  tlie 
conditions  leads  to  a set  that  is  consistent,  tliia  appears  to  be  a justification 
for  the  term  "resolution." 

The  foriuiil  elements  of  a yroup  preference  function  are:  a set 

I --  ii,  j,  k,...i  of  Individual  members  of  a group;  a set  X = {x,  y,  z,...} 
of  objects;  a set  K = IR,  R,  R',...l  of  vectors  of  individual  preference 
relations  (that  is,  eacli  R = (Rj^ , R2,...,R^)  in  K is  an  indexed  set  of  preference 
relations  over  X,  wliere  the  indices  correspond  to  the  members  of  l);  a function 

F(R)  wliich  maps  each  member  of  K onto  a relation  (group  preference  relation) 

* ► 
over  X.  A superfix  arrow  indicates  strict  preference,  i.e.,  xRy  means 

xKy  and  not  yRx. 

II  f is  intended  to  define  a general  social  welfare  function,  then  X would 
be  Interpreted  as  a set  of  potential  states  of  society.  However,  the  formal 
treatment  of  tfie  problem  is  not  concerned  with  the  nature  of  the  elements  of 
X,  and  for  simplicity  they  will  be  referred  to  as  ob j ects . Similarly,  the 
fact  that  the  individual  relations  R^  and  the  group  relation  F (R)  are  prefer- 
ence relations  is  not  part  of  the  fornuilism.  Thi'  analysis  is  concerned  with 
tlie  existence  or  not  of  an  aggregation  function  F which  takes  a vector  of 
relations  as  its  arguments,  and  which  fulfills  a set  of  conditions  that  look 
reasonable  for  a group  preference  function,  but  might  equally  be  appropriate 
for  a wide  variety  of  aggregation  procedures — a.g.,  the  R^^  might  be  a set  of 
★ 

In  Arrow's  formalism,  the  set  I is  expressed  by  the  indices  on  the  individual 
preference  relations,  X lo  expressed  Implicitly  as  the  field  of  the  individual 
preference  relations,  and  the  set  K is  characterized  as  a set  of  admissable 
preference  relations,  where  admissable  Is  taken  to  mean  a set  "fo;'  which  the 
social  welfare  function  defines  a corresponding  social  ordering."  ^ 

The  implicit  nature  of  these  entitles  leads  to  some  minor  ambiguities; 
however,  since  these  do  not  appear  to  affect  the  central  possibility  theorem, 
they  will  not  be  pursued  here. 
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individual  rank  orderings  of  a set  of  objects  on  some  psychological  magnitude, 
and  F(R)  a "representative  ordering"  for  the  group.  The  only  issue  is  the 
reasonableness  of  the  stated  conditions  for  the  intended  application.  By  .and 
large,  the  conditions  proposed  by  Arrow  appear  to  be  reasonable  for  a wide 
class  of  "representative"  aggregation  functions. 

In  this  section,  the  conditions  proposed  by  Arrow  and  the  Impossibility 
theorem  are  stated  for  reference  purposes.  In  the  following  sections  the 
resolution  of  the  theorem  is  taken  up.  Except  for  A1  and  A2 , the  numbering 
is  kept  consistent  with  Arrow's.  Some  minor  notatlonal  differences  are 
introduced,  mainly  to  simplify  the  translation  of  the  conditions  to  the 
corresponding  ones  for  scales  in  the  following  sections. 

It  is  convenient  to  liave  an  additional  piece  of  notation.  Let  T , 
where  B is  a class  of  objects  and  T a relation,  designate  the  relation  T 
restricted  to  the  class  B;  that  is,  B is  in  X,  and  xT^y  if  anti  oiilv  if  x and  y 

— X 

are  in  B and  xTy.  T will  be  used  to  designate  the  relation  T restricted 
to  the  set  .X  - 1 x } . 

A1 . For  every  R in  K and  every  1,  R^  is  a weak  ordering  on  X. 

A2 . For  every  R in  K,  F(R)  is  a weak  ordering  on  X. 

A1  and  A2  simply  assert  that  the  Individual  relations  and  the  group 
relation  are  weak  orders  on  X.  The  next  set  of  conditions  define  additional 
properties  of  F(R). 

Cl.  There  is  a set  S in  X,  such  that  S contains  three  members,  and  for 
any  possible  vector  of  orderings  T of  S,  there  is  an  R in  K such  that  T = R^. 

This  condition  Is  intended  to  assure  that  whatever  the  nature  of  K,  It  is 
possible  to  find  at  least  three  objects  for  which  all  possible  orderings 
for  n individuals  are  exemplified  by  some  members  of  K.  As  Arrow  remarks, 
the  basic  theorem  is  essentially  demonstrated  for  thi.s  set  of  three  objects. 
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Tho  next  condition  will  be  introduced  bv  a definition:  R will 


be  said  to  be  a forward  shift  of  x,  with  respect  to  R,  FS(R,x,R),  when, 

R =R  , and  for  every  1 and  y,  if  xR^y  then  xK^y  an<l  for  every  i and  y 
if  xRj^y  then  xR.y.  That  is,  R and  R are  identical  except  for  x,  and 
whatever  location  x has  in  R^^,  it  is  at  least  as  "high"  in  Rj^. 

C2.  If  FS(R,x,R)  then  xF(R)y  implies  xF(R)y.  Arrow  calls  this 
condition  "positive  association  of  social  and  individual  values."  For  the 
next  condition,  a further  notion  is  needed.  Let  C(S,T)  designate  the  set 
of  X in  S,  such  that  for  every  y in  S , xTy.  That  is,  C(S,T)  is  the  set  of 
maximal  elements  in  S with  respect  to  the  relation  T.  If  S has  no  maximal 
elements  with  respect  to  T (e.g.,  if  S is  the  set  of  all  real  numbers  less 
than  1,  and  T is  "greater  than")  then  C(S,T)  is  the  null  set.  Arrow  makes 
no  provision  for  this  case,  but  there  is  no  loss,  since  he  is  concerned 
primarily  with  the  finite  special  set  S which  does  have  maximal  elements,  for 
every  T. 

C3.  For  every  R and  R,  R®=R®  implies  C(B,F(R))  = C(B,F(R)).  This 
axiom,  called  the  independence  of  irrelevant  alternatives,  is  perltaps  the 
key  condition  in  the  derivation  of  the  impossibility  theorem.  It  essen- 
tially has  the  effect  that  the  social  preference  between  any  pair  x and  y 
will  depend  only  on  the  individual  preferences  for  that  pair. 

04.  For  every  x and  y in  X,  there  is  an  R in  K,  such  that  not  xF(R)y. 
This  condition,  called  the  condition  of  Citizen's  Sovereignty,  (or  also, 
the  condition  tluit  the  group  preference  is  not  imposed ) is  also  a key  con- 
dition for  the  imijoss Ibl 1 i ty  theorem.  It  posits  a decoupling  between  the 
admissable  set  K and  the  set  of  objects  X.  For  example,  it  rules  out  the 
possibility  that  there  exists  in  X one  object  which  is  preferred  by  every- 
body to  some  other  object  for  all  admissable  orderings  R.  Although  Intended 


only  to  make  sure  that  the  indlviduaL  preference  relations  in  fact  detertnine 
the  group  preference  relation,  it  asserts  something  stronger. 

C5.  For  every  i tliere  exists  x,  y,  and  R such  tliat  xF(R)y  and  not 


xR.y. 

This  condition,  non-dictatorship,  asserts  that  for  any  individual  there 
at  least  one  pair  of  objects  and  a set  of  orderings  for  the  other  individuals 
which  generates  a group  preference  contrary  to  tlie  given  individual. 

Given  the  preceeding  conditions  Arrow  proves  the  theorem. 

A 

THKORKM  2 (Arrow):  Tiiere  is  no  function  F with  the  listed  properties. 

The  proof  is  somewiiat  extensive  and  will  not  be  reproduced  here,  since 
the  primary  purpose  of  listing  the  conditions  is  to  show  that  a small 
modification  of  them  will  enable  the  demonstration  of  a "possibility  theorem. 
3.  D.f_S.tess_ion  on  Measurement  Theory 

In  discussions  of  measurement  theory  as  applied  to  psychological  and 
social  magnitudes,  a great  deal  of  attention  has  been  paid  to  the  degree-  to 
which  a scale  is  prescribed  by  a set  of  measurement  rules.  A c lass  1 1 Icat ion 
of  scales  based  on  such  rules  was  presented  in  Chapter  II,  section  2. 

Although  this  transformational  approach  to  measurement  is  e.xtremely 
useful  for  many  theoretical  investigations,  it  tends  to  obscure  some  of  the 
basic  features  of  measuring  scales  as  applied  in  practice.  In  particular, 
it  deempliaslzes  the  role  of  reference  objects  for  practical  scales,  and,  in 
fact,  often  subtly  downgrades  these  by  referring  to  them  as  "arbitrary 
constants."  It  is  true  that  within  elementary  tliermometrics , the  zero-point 
and  100-polnt  determinations  are  "arbitrary"  but  that  does  not  mean  they  .ire 
* 

Arrow  states  the  theorem  In  what  appears  to  be  a more  restricted  form, 
namely  a social  welfare  function  satisfying  conditions  1-3  and  A1  and  A2 , 

Is  either  dlct.atorial  or  conventional,  but  any  combination  of  less  than  all 
the  conditions  could  be  selected  and  the  assertion  made  that  any  welfare 
function  satisfying  them  must  violate  the  remaining  ones. 
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dispensable.  It  a doctor  is  told  that  the  thermoraeter  reading  of  a patient  is 
4b,  he  knows  nothing  about  the  t emperatvtre  of  the  patient  until  he  knows  the 
specific  scale  on  which  it  is  measured,  i.e.,  until  he  knows  two  physical 
conditions  like  the  freezing  point  and  boiling  point  of  water  which  are  associ- 
ated with  two  locations  on  the  given  scale. 

,\  similar  r(t<]ul rement  holds  for  ordinal  scales  if  these  are  to  be 
used  for  Indirect  comparison  of  objects.  Tills  feature  of  ordinal  scales 
seems  to  have  been  overlooked  by  measurement  theorists  in  the  social 
sciences.  Thus,  it  is  common  to  find  "x-point  scales"  (e.g.,  a 5-point 
scale  like:  5-vcry  pleasant,  4-somcwhat  pleasant,  3-neutral,  2-somewhat 
unpleasant,  1-very  unpleasant,)  with  no  definite  reference  objects  at  all. 

The  question  whether  such  scales  "measure  anything"  is  more  profound  than 
the  question  whether  it  is  "legitimate"  to  perform  arithmetical  operations 
with  the  assigned  numbers.  More  basically,  the  issue  is  whether  the  esti- 
mates by  subjects  have  the  requisite  stability  to  assert,  e.g.,  that  the 
class  of  "vi’ry  ple.asant"  objects  is  defined  at  all. 

In  the  physical  sciences,  ordinal  scales  have’ been  used  in  rmany 
different  fields.  A well-known  physical  ordinal  .scale  is  the  Moh's  hard- 
ness scale.  This  scale  is  based  on  the  relation  "scratches."  One  sub- 
stance X is  considered  harder  than  another  y if  x scratches  y.  (This  is 
the  basis  of  the  familiar  test  whether  a gem  is  real  or  "paste"  by  seeing 
if  it  will  scratch  glass.)  Although  the  relation  "scratches"  is  well  defined, 
it  is  of  little  utility  to  engineers  by  It.sclf.  It  becomes  useful  when  it 
is  augmented  to  a scale  by  the  addition  of  a set  of  reference  objects.  In 
the  case  of  the  Moh's  hardness  scale,  there  Is  a set  of  10  reference 
substances — Diamond,  sapphire,  topaz,  quartz,  feldspar,  apatite,  flourlte, 
calclte,  gypsum,  and  talc.  (Window  glass  has  a hardness  of  5.5  on  this 
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IS  scraicueci  by  Lopaz,  iL  lias  a hardness  between  / and  ». 


The  point  is  t liat  if  you  know  the  location  of  two  substances  in  the 
scale,  you  know  tlieir  relative  liardness  without  a direct  comparison. 

In  essence,  the  Arrow  requ  i r e.ments  on  a group  preference  function 
demand  that  the  function  be  created  witfiout  the  stability  of  reference 
points.  This  is  done,  ostensibly,  to  rule  out  I tie  necessity  for  inti^r- 
personal  comparison  of  utilities,  and  to  maint.iin  the  ordinal  nature  oi 
the  group  preference.  The  difficulty  lies,  tiien,  not  in  anv  of  the  speci- 
fic conditions  which  Arrow  requires,  but  in  t lie  informal  contextual  frame- 
work, in  whicli  tlie  group  preference  function  is  required  to  be  a function 
of  the  individual  preference  relations  only.  As  we  sliall  see  below,  if 
this  S''ingent  requirement  is  relaxed  slightly  to  allow  the  group  preferences 
to  be  a function  of  both  individual  preferences  and  individual  reference 
objects,  the  difficulty  disappears. 

It  could  be  asked  whether  including  reference  objects  in  the  group 
function  does  not  bring  Interpersonal  comparisons  of  utility  in  via  tlie 
back  door.  For  myself,  1 liave  no  strong  objections  to  interpersonal  com- 
parisons of  utility.  One  of  the  strongest  objections  is  that  it  cannot 
be  done,  that  the  "strength  of  preferences"  is  a subjectivi , non-observable 
quantity.  That  particular  objection  disappears  if  each  individual  has  a 
set  of  reference  objects  which  are,  so  to  speak,  int ersubj ec ti ve . Inter- 
personal comparisons  on  those  reference  objects  are  clearly  feasible,  and 
logically  unobjectionable.  There  is  no  requirement  that  such  comparisons 
be  couched  In  terms  of  tlie  strengtli  of  feelings  about  object  x to  individual 
i,  versus  the  strength  of  feelings  of  Individual  j. 
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Some  of  this  discussion  is  proleptic,  since  the  way  in  wtiich  reference 
objects  will  be  used  to  construct  a consistent  group  preference  function  has 
not  yet  been  introduced.  However,  for  the  time  being,  no  metaphysical  entitles 
are  involved  in  establishing  Individual  preference  scales  with  reference 
objects,  and  similarly,  none  with  group  scales  based  on  the  individual  refer- 
ence objects,  provided  the  latter  are  observable  to  the  total  group. 

A . Gj oup  Anchored  Scales 

We  turn  to  the  construction  of  group  preference  functions,  based  on  the 
notion  of  anchored  scales. 

Def.  1:  An  individual  anchored  scale  consists  of  a set  of  objects  X,  a 

weak  ordering  relation  on  X,  and  a designated  subset  of  X.  The  scale 

value,  S^Cx)  of  an  object  in  X is  defined  by  the  rule: 

S^(x)  =•  a means  a is  in  A^,  a R^x,  and  for  every  b a in  A^ , 

aR^b  implies  xR^^b. 

This  definition  can  probably  be  most  conveniently  explicated  by  a 
diagram,  where  A^  = fa,  b,  c,  d} 


S^(x)  = a means  x is  in  the  interval  between  b and  a.  It  is  convenient,  but 
not  necessary,  that  the  members  of  A^  be  strongly  ordered  by  R^,  i.e., 
if  a,  b are  in  A^^  and  afb,  then  either  aRj^b  or  bR^a.  It  is  also  convenient 
but  not  necessary  that  A^  contain  the  maximal  element  In  X with  respect  to 
Rj  , If  there  Is  one.  For  tlie  case  that  A^  does  not  contain  the  nuixiraal 
element  of  X (a.s  illustrated  by  y in  the  diagram)  an  additional  value  must 
be  defined  for  objects  aix)Vft  the  nviximal  object  in  Aj,  say  m.  Thus  Sj(y)  “ m 
in- ails  yKjO.  (If  there  Is  no  maximal  clement,  but  a minimal  eit*menl,  then 
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for  some  purposes.  It  may  be  convenient  to  switch  the  definition  of  the 
interval,  and  call  S^(x)  the  next  lowest  member  of  A^.  However,  this  leads 
to  unnecessary  complications  with  constructing  a group  scale  if  not  all  of 
the  individual  ordering  relations  have  minimal  elements.) 

Tn  the  examples  used  to  illustrate  constructions,  the  ' s are 
treated  as  if  they  were  finite.  This,  again,  is  not  necessary,  but  sim- 
plifies much  of  the  discussion.  It  does  not  seem  meaningful  to  speak  of 
an  infinite  set  of  reference  objects  unless  they  are  generated  by  some 
mathematical  rule,  and  In  a sense,  this  negates  the  basic  notion  of 
reference  object.  However,  there  is  no  logical  difficulty  in  allowing 
infinite  sets. 

The  scale  can  be  Interpreted  as  Inducing  a new  ordering  relation 
on  X which  can  be  designated  R*,  where  xR*y  means  S^(x)  (y) . 

Def.  2:  A group  anchored  scale  F(S)  consists  of  a set  X,  a set  A, 

and  a weak  ordering  relation  G on  A.  S for  this  definition,  is  a vector 
of  Individual  anchored  scales,  l.e.,  S = (S^ , S2 , • . • , S^) . A is  the  Cartesian 
product  of  the  anchor  sets  of  the  individual  scales,  i.e.,  A = A^^XA2 , • . . XA^ 
(The  Cartesian  product  is  the  set  of  all  n-tuples  tliat  can  be  formed  by 
picking  one  object  from  each  of  the  n individual  anchor  sets.  Tills  is 
illustrated  in  Fig.  45.)  Since  A is  the  Cartesian  product  of  individual 
anchor  sets,  A is  not  a member  of  X.  There  are  several  different  ways  in 
which  the  relation  G can  be  extended  to  X to  produce  an  anchored  scale  on 
X.  The  simplest  would  appear  to  be  to  extend  It  to  X with  the  definition 
X G y means  S(x)  G S(y) 

Strictly  speaking,  G in  the  expression  x G y Is  a different  relation  than 
in  a G b,  where  a and  b are  in  A;  however,  the  distinction  is  sufficiently 
expressed  by  the  difference  in  notation  for  objects  in  A and  other  objects. 


.^68 


The  following  diagram  illustrates  what  Is  going  on.  Suppose  there 
are  two  individuals,  = la,b,c,d)  and  = {e,d,c}  (There  is  no  logical 
relationship  between  the  anchor  sets  of  different  individuals.  They  can  be 
identical,  overlap  somewhat,  or  be  entirely  distinct.  In  practice,  of 
course,  there  would  be  many  advantages  to  having  a common  set  of  reference 
obj  ects . ) 


cl  c b a 

Figure  45.  Group  Ordinal  Scale 


The  n-tuples  of  Individual  reference  objects  determine  a set  of  n- 
dlmensional  "boxes."  The  group  scale  value  S(x)  is  determined  by  the  box 
in  which  x lies.  Illustrated  is  the  case  S(x)  = (c,d),  that  is,  (x)  = c, 
S2(x)  = d. 

G (not  Illustrated  for  reasons  that  will  be  clear  shortly)  orders  the 
n-tuples.  In  the  diagram,  this  means  G orders  the  boxes.  In  this  respect, 
the  "edge"  boxes  like  the  one  containing  y in  the  diagram  are  treated  like 
all  the  others. 

A key  step  In  constructing  a group  anchored  scale  consists  in  postu- 
lating the  set  of  admissable  individual  anchored  scales  as  follows:  The 

set  K = (S,  S,  S',...)  of  admissable  anchored  scales,  for  a group  scale  F, 

consists  of  all  n-vectors  of  Individual  anchored  scales  S = (S, ,S  ,...S  ) 

I 2 n 
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such  that  A.  is  identical  for  all  S..  That  is,  for  all  members  of  K and 
1 1 

all  individuals,  the  set  of  reference  objects  remains  fixed,  and  a b if 
and  only  if  a R.b  for  all  a,b  in  A.,  and  all  R, , R,  in  K. 

Intuitively,  what  is  involved  should  be  fairly  clear.  Each  individual 
selects  a set  of  reference  objects.  These  he  orders  according  to  his  own 
preferences,  and  it  is  assumed  that  for  the  duration  of  the  given  group 
preference  function,  he  does  not  change  his  preferences  for  this  special  set. 
The  group  function  first  addresses  the  various  combinations  of  the  individual 
reference  objects  and  imposes  an  order  on  them.  For  any  objects  not  in  the 
reference  sets  of  the  individuals,  the  group  preference  relation  is  extended 
in  the  obvious  way,  by  ordering  them  in  the  same  way  as  the  "regions"  in 
which  they  fit  as  determined  by  the  Individual  scales. 

Tliis  leads  to  a slightly  anomalous  situation  with  respect  to  objects 
that  belong  to  some  but  not  all  of  the  Individual  anchor  sets.  These 
cases  have  been  dealt  with  below  by  excluding  from  the  conditions  imposed 
on  the  group  function  any  x's  which  belong  to  any  of  the  A^'s.  This  is 
perliaps  a little  heavy  handed;  however,  it  saves  expressing  the  complex 
of  exceptions  that  would  be  needed  if  the  partial  members  of  reference  sets 
should  be  included  in  the  conditions.  This  approach  does  not  appear  to 
violate  the  spirit  of  a general  welfare  function,  since  we  are  primarily 
concerned  with  removing  the  difficulties  with  those  objects  for  which  all 
possible  preference  orderings  are  allowed.  In  the  special  case  tfiat  the 
Individual  reference  sets  are  Identical,  the  blanket  exclusion  appears  just 
right. 

It  perhaps  should  he  pointed  out  that  the.  definition  of  a group  anchored 
scale  contains  an  imf>liclt  assumption,  namely  that  the  group  is  indifferent 
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between  nil  objects  x,  y such  that  S (x)  - S(y).  This  can  be  softened  some- 
what If  we  assume  that  the  selection  of  a scale  by  an  Individual  is  equiva- 
lent to  the  assumption  tliat  he  considers — for  the  purposes  of  group  aggrega- 
tion— X equivalent  to  y if  S(x)  = S(y),  and  we  add  the  assumption  that  the 
group  is  indifferent  between  x and  y if  all  members  of  the  group  are  indif- 
ferent between  x and  y.  Strictly  speaking,  these  comments  are  not  part 
of  the  formal  possibility  theorem  that  is  being  sought.  However,  it  is 
clear  that  for  any  judgment  as  to  whether  the  apiiroach  is  reasonable  or  not, 
the  question  whether  an  individual  can  generate  a fine  enough  scale  to  make 
him  accept  a group  aggregation  is  relevant. 

Notliing  in  the  preceeding  determines  in  any  way  the  number  of  members 
of  the  A^'s.  Clearly,  if  any  is  empty,  then  individual  i is  a dummy, 
and  has  no  influence  on  the  group  preference.  However,  if  there  is  only 
one  member  of  A^,  the  individual  can  discriminate  to  the  extent  of  saying 
whetlier  a given  object  is  better  or  worse  than  a.  Condition  1 below  requires 

that  each  A.  have  at  least  two  members. 

1 

We  now  proceed  to  transliterate  the  Arrow  conditions  into  corresponding 
ones  appropriate  for  anchored  scales.  The  primary  change  is  excluding  the 
anchor  sets  from  the  conditions.  However,  C2  is  expressed  in  such  a way 
that  it  applies  to  the  anchor  sets  in  a derivative  fashion.  It  is  clear 
tiiat  some  conditions  would  want  to  be  Imposed  on  the  group  function  G as 
it  applies  to  the  anchor  sets. 

Condition  1.  There  is  a set  of  three  objects  B,  such  that  B does 
not  overlap  with  any  A^,  and  for  every  T,  where  T is  a possible  ordering  of 

g 

three  objects  by  n individuals,  there  is  an  S in  K sucii  that  R*  is  isomor- 
phic to  T. 
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Recall  that  R|  is  the  imposed  relationship  defined  by  R^  "restricted" 
to  S^(x). 

Condition  1 states  that  there  are  at  least  three  objects  which  can 
take  on  any  possible  ordering  (by  n indiv idvials)  of  die  scale  values  of  the 
objects.  The  condition  is  expressed  in  terms  of  the  R*'s  being  isomorphic 
to  T rather  than  identical  with  it  on  B,  to  save  a certain  amount  of  circum- 
locution concerning  the  relation  of  T to  A.  As  noted  above,  Condition  1 
requires  that  every  A^^  have  at  least  two  members  (at  least  three  intervals 
for  each  individual  scale.) 

Def.  3:  FS(x,S,S)  means  FS(x,R,R),  where  R and  R are  the  ordering 

relations  corresponding  to  S and  S respectively.  This  definition  simply 
transfers  the  meaning  of  FS  from  relations  to  scales. 

Condition  2:  If  x is  not  in  any  A^  and  FS(x.S,S),  then  if  xGy,  xGy. 

This  is  a fairly  straightforward  translation  of  Arrow's  condition  2 to 
anchored  scales.  It  is,  of  course,  weaker  than  the  Arrow  condition,  since 

we  exclude  the  anchor  sets. 

H B B B 

Condition  3:  If  S = S , then  F(S)  = F(S)  . Here  we  must  include 

in  the  operation  S that,  in  restricting  a scale  to  the  set  B,  we  also 
retain  A. 

Condition  4:  For  every  x and  y,  x,  y not  in  any  A^^,  there  is  an  S such 

that  xdy,  where  G is  the  group  relation  associated  with  F(S). 

Again,  this  is  the  direct  analogue  of  Arrow's  condition,  with  the 
exception  of  the  reference  sets.  In  a sense,  F(S)  is  imposed  for  A.  The 
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>;roup  selects  a C,  and  this  I'omains  lixed,  as  f.ir  la  A is  cunctriu-d,  tor 
fvc-ry  acceptable  S.* 

Condition  b : For  every  i,  there  exists  x not  in  A and  S,  y such  that 

Sj(x)  Sj^Cyj,  and  y C x. 

It  cJearly  would  be  unftiJr  to  ask  only  x R.  y,  since  tliis  might  mean 
Sj^(x)  = S^(y),  which  would  be  a poor  negation  of  our  dictator. 

Given  tlie  above  transcription  of  the  Arrow  conditions  we  can  state  the 
possibility  theorem; 

IHFOREM  3.  There  is  a function  F(S)  which  satisfies  conditions  1-5. 

Actually,  there  are  an  infinite  number  of  such  functions.  However, 
for  the  assertion  of  the  compatibility  of  conditions  1-5,  it  is  necessary 
to  exhibit  only  one.  A simple  function  which  fulfills  the  conditions  is  one 
that  could  bo  called  modified  sum  of  ranks . To  each  of  the  individual 
reference  objects,  a rank-order  number  is  assigned  say  in  the  ascending 
order  of  preference.  Gall  this  rank  order  niunber  S?(a)  and  derivatively, 
s”(x)  Is  the  rank-order  number  of  S^(x). 

We  then  define 

s”(x)  = J S°(x) 

X G,  y if  and  only  if  S°(x)^S°(y) 

It  is  clear  that  G is  a weak  order  over  X,  since  everv  x has  a number 
assigned  to  it  by  S°  and  ^ is  a weak  order.  Thus  the  definition  assures 
that  this  F is  Indeed  an  anchored  scale. 

Taking  up  the  conditions  in  turn.  Condition  1 can  be  satisfied,  as 
previously  mentioned,  if  each  Aj^  has  at  least  two  members, 

*ln  a wider  sense,  this  is  not  necessary,  as  the  proof  of  the  possibility 
theorem  below  shows.  Thus,  F(S)  could  be  required  to  "work"  for  every  A, 
as  well  as  for  every  set  of  x not  including  A,  providing  condition  3 is 
restricted  to  the  situation  after  an  A has  been  selected.  The  specific 
preference  scale  used  in  the  possibility  theorem — modified  sum  of  ranks — 
does  in  fact  work  for  every  finite  A.  But  to  be  quite  frank,  I haven't 
found  a sufficiently  general  notation  within  which  this  more  powerful  kind 
of  condition  can  be  expressed. 
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Condition  2:  If  we  have  FS(x,S,S),  then  S°(x)  > S°(x)  and  S° (y)  = 

S (y)  for  y ^ X.  If  S^(x)  > S°(y),  S°(x)  > S°(y)  = S°(y).  From  which  the 
conclusion  S°(x)  > S°(y)  follows. 

Condition  3 is  immediate.  Restricting  the  individual  scales  to  the 
set  B (plus  A)  does  not  change  S° (x)  where  x is  in  B. 

Condition  4 is  easily  satisfied.  We  simply  have  to  posit  that  there  is 
an  S such  that  S°(x)  > S°(y),  which  obtains  if,  e.g.,  S°(x)  > S°(y)  for  every 
i (unanimity.) 

Condition  5 is  equally  easily  satisfied.  We  only  need  find  a case  where 
Sj  (x)  Sj(y)  for  every  j i,  and  S^(x)  = S^(y)-1  (l.e.,  everybody  except 
i prefers  x to  y,  and  i ranks  x one  level  lower  than  y) . 

This  completes  the  possibility  theorem  for  group  anchored  scales.  It 

should  be  clear  that  the  specific  group  scale  used  above,  namely  S°(x)  = 

X*S°(x),  is  only  one  of  an  infinite  set  which  would  fulfill  the  conditions, 
i 

The  modified  sum  of  ranks  group  scale  was  selected  mainly  because  it  made 
the  demonstration  easy.  There  was  one  other  secondary  reason.  Ordinary 
sum  of  ranks  does  not  fulfill  Arrow's  Condition  3.  The  modified  sum  of 
ranks  does  not  get  into  trouble  because  the  anchor  set  is  retained  in  going 
to  a subset.  Thus,  the  modified  sum  of  ranks  is  a good  elementary  example 
of  how  it  is  feasible  to  have  an  ordinal  group  scale  which  does  not  run 
into  the  difficulties  attendant  on  aggregation  of  "pure"  relations.  It 
might  be  objected  that  group  anchored  scales  defined  by  some  function  of 
ordinal  numbers  on  the  individual  anchors  are  not  "purely  ordiital."  In 
general  such  functions  will  not  be  Invariant  over  separate  monotonic  trans- 
formations on  the  Individual  scales  (e.g.,  if  the  order  numbers  of  one  of  tiu 
individual  scales  was  multiplied  by  a factor  of  1000,  the  others  remaining 
the  same,  the  group  scale  would  be  dictatorial.)  That  does  not  mean, 
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howevor,  that  there  are  hlcUieu  itardinaL  assumptions  in  the  notion  of  group 
aneliored  scale.  A group  scale  c:an  be  fully  specified  with  no  reference  to 
numbers  whatsuevi'r. 

Although  the  modified  sum  of  ranks  group  scale  was  selcctid  primarily 
fur  its  simplicity  in  showing  that  the  conditions  can  be  fulfilled,  it 
should  be  of  use  in  any  situation  where  the  ordinary  sum  of  ranks  appears 
relevant.  A slightly  casual  suggestion  along  these  lines  is  contained  in 
the  following  section. 

Someone  might  want  to  object  that  the  modified  sura  of  ranks,  and 
similar  group  scales,  falls  down  if  the  anchor  sets  are  augmented  or  reduced. 
In  some  sense  that  is  correct,  but  for  the  present  purposes,  augmenting 
or  reducing  the  anchor  sets  is  equivalent  to  defining  a new  group  function  F, 
and  it  is  not  expected  that  Condition  3,  e.g.,  will  hold  across  different  F's. 

Since  there  are  many  different  group  scales  which  will  fulfill  condi- 
tions 1-5,  we  suddenly  have  an  embarrassment  of  riches.  I would  guess  that 
one  of  tiu'  reasons  Arrow  dealt  with  such  a spare  and  stringent  set  of  basic 
notions  was  the  hope  that  a few  conditions  would  essentially  determine  the 
form  of  a rational  group  function,  or  perhaps  limit  the  possibilities  to  a 
well-defined  subclass.  That,  of  course,  would  have  been  a highly  significant 
result.  Unfortunately,  that  hope  is  not  fulfilled  by  group  anchored  scales. 
This,  of  course,  leads  to  the  important  practical  question  — how,  in  any 
given  decision  situation,  one  might  go  about  selecting  a particular  group 
sea le . 

At  the  present  time,  there  does  not  appear  to  be  a way  to  determine 
the  "right"  group  scale,  based  on  some  additional  requirements.  This  is 
true  only  .so  long  as  we  stay  In  the  context  of  ordinal  scales.  If  the 
notion  of  numerical  utilities  is  Introduced,  then  as  demonstrated  in  .Section  6 
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below,  a few  additional  assumptions  sharply  restrict  the  form  of  a group 
scale.  But  within  the  ordinal  context,  about  all  one  can  say  is  that  a 
group  can  adopt  a particular  group  scale  in  about  the  same  spirit  in  which 
it  might  adopt  a constitution,  or  Robert's  Rules  of  Order,  or  any  other 
formal  mechanism  for  systematizing  decision-making. 

More  generally,  it  is  my  impression  that  we  have  very  little  knowledge 
concerning  anchored  scales  in  social  measurement.  This  is  particularly 
true  for  those  cases  where  the  objects  being  scaled  are  very  complicated 
entities  such  as  "states  of  society,"  or  political  or  social  systems.  For 
this  type  of  social  scaling,  it  seems  intuitively  clear  that  criteria  (even 
for  an  individual)  are  highly  multi-dimensional,  and  the  problem  of  defining 
reference  objects  must  be  tackled  on  the  level  of  more  elementary  sub-scales. 

This  consideration  is  one  of  the  driving  forces  behind  the  social  indica- 
tors movement.  Society  may  be  an  object  which  is  just  too  complicated  to 
view  "in  toto." 

5 . ^x amp le : Electing  a President 

A relatively  straightforward  example  of  a possible  application  of  the 
notion  of  anchored  scales  can  be  seen  in  voting  schemes  for  public  officials. 

One  of  the  most  serious  consequences  of  the  type  of  inconsistency  formalized  in 
the  Arrow  Theorem  is  the  fact  that  elections  need  not  result  in  the  selection 
of  the  candidate  most  favored  by  the  electorate.  This  fact  is  obscured  in 
presidential  elections  in  the  United  States  by  the  two-party  system  and  the 
large  proportion  of  cases  In  which  there  are  only  two  major  candidates.  It  is 
a triviality  tliat  majority  vote  produces  a "consistent"  group  preference  relation 
when  there  are  only  two  alternatives.  However,  for  the  case  of  more  than  two 
alternatives,  several  undesirable  types  of  outcome  are  possible.  In  the  French 
style  election  where  there  l.s  a run-off  between  the  two  candidates  with  the 
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highest  initial  votes,  rliere  is  a good  chance  that  ttie  candidate  with  the  most 
overall  support  in  the  electorate  will  be  eliminated  on  the  first  round.  For 
example,  consider  the  case  where  there  are  three  candidates.  A,  B,  C,  and  three 
voting  blocs,  X,  Y,  and  Z,  with  the  preferences  (where  1 means  most  preferred, 

3 means  least) 

A B C %ofvote 

X 1 2 3 25 

Y 2 1 3 37.5 

Z 2 3 1 37.5 

Since  B and  C are  the  two  winners  on  the  first  round,  A is  eliminated,  and  the 
winner  of  the  run-off  is  B.  However,  A would  win  a two-candidate  majority 
vote  against  either  B or  C,  receiving  62.5%  of  the  vote  in  either  case. 

It  is  easy  to  construct  cases  in  which  the  "worst"  candidate  wins. 

Consider  the  following  case 

A B C %ofvote 

X 1 2 3 26 

Y 3 1 2 28 

Z 2 3 1 A6 

As  before,  A is  eliminated  on  the  first  round,  pitting  B against  C.  Since 
B is  preferred  by  54%  of  the  electorate,  he  wins.  However,  if  we  use  a 
simple  scaling  procedure,  namely  the  average  rank  of  each  candidate  in  the 
voters’  preference  relations,  the  average  ranks  are  2.02,  2.18,  1.80.  B,  with 
the  lowest  average  rank,  takes  offlcel  It  seems  highly  likely  that  in  the 
course  of  French  elections  situations  at  least  as  anomalous  as  those  illus- 
trated have  occurred. 

Even  in  the  U.S.,  we  occasionally  have  dubious  cases.  Take  the 
Roosevelt,  Taft,  Wilson  election  of  1912. 
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Roosevelt  Taft  Wilson  % of  vote 
R 1 23  .30 

T 2 1 3 .25 

W 2 3 1 .45 

The  tally  of  votes  in  the  three-way  election  does  not  tell  the  whole 
story.  A plausible  assumption  is  that  those  who  voted  for  Taft  or  Roosevelt 
would  have  preferred  either  to  Wilson,  giving  the  preference  pattern  in  the 
table.  With  this  preference  pattern,  Roosevelt  would  have  won  a two  candi- 
date maiorlty  vote  against  either  Taft  or  Wilson,  and  Taft  would  have  won 
in  a straight  majority  contest  with  Wilson.  Thus,  by  majority  vote,  the 
preference  order  is  Roosevelt,  Taft,  Wilson.  The  average  preference  rank 
for  Roosevelt  is  definitely  better  than  the  other  two.  Wilson  barely  nudges 
out  Taft. 

These  anomalies  can  be  straightened  out  by  using  a form  of  anchored 
voting.  In  the  case  of  presidential  elections,  there  is  a natural  set  of 
anchors,  namely  the  list  of  past  presidents.  In  the  crudest  application, 
each  individual  voter  at  his  leisure  would  rank  order  the  past  presidents. 

On  election  day,  the  voter  reports  the  scale  position  of  each  of  the  present 
candidates  in  his  scale  of  past  presidents.  Tallying  would  consist  of 
adding  up  the  scale  values  of  each  candidate,  and  the  one  with  the  highest 
scale  sum  wins.  There  is  no  requirement  that  the  Individual  voters  rank 
the  past  presidents  in  the  same  order  each  can  have  his  own  personal  ordering. 

Anchored  voting  would  not  only  guard  against  electing  a non-preferred 
candidate,  it  would  give  a relatively  unambiguous  rating  to  all  of  the  current 
candidates.  There  is  one  drawback  to  the  elementary  election  procedure  just 
described.  There  mlglit  be  a tendency  for  voters  to  Ive  an  artificially  ttlgh 
rating  to  their  favorite  candidate  and  an  artificially  low  rating  to  all  the 
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others.  This  would,  of  course,  negate  the  procedure.  There  is  a form  of  proper 
scoring  procedure  that  Is  applicable  to  anchored  ratings  which  would  keep  the 
voters  honest.  It  is  similar  to  the  bidding  lule  described  in  Chapter  III. 
in  the  case  of  voting,  a large  set  of  initial  candidates  is  chosen,  say  100  as 
a round  figure.  The  electorate  ranks  all  of  these  against  their  presidential 
scale.  The  final  slate  is  a much  smaller  number,  say  10,  selected  out  of  the 
initial  list  at  random.  The  candidate  with  the  highest  rating  in  this  sub- 
list is  declared  the  winner.  In  this  case,  if  the  voter  inflates  his  rating 
for  his  favorite  candidate,  he  runs  the  risk  of  having  that  favorite  candi- 
date ruled  out  by  the  random  selection,  in  which  case  he  would  have  penalized 
his  lower  ranking  choices. 

As  it  stands,  the  form  of  election  procedure  just  described  is  somewhat 
more  cumbersome  than  present  procedures.  It  is  not  without  compensating 
virtues.  As  a television  show,  the  drawing  for  the  final  slate  of  candidates 
could  be  a highly  dramatic  event.  A certain  amount  of  streamlining  would, 
of  course,  be  feasible.  The  set  of  anchors  need  not  be  all  past  presidents, 
but  some  smaller  subset.  The  number  of  initial  candidates,  and  the  size  of  the 
final  slate  could  be  pared  by  judicious  statistical  design. 

6 . Cardinal  Group  Utility 

Having  arrived  at  the  point  wiiere  a group  preference  function  is  at 
least  logically  feasible,  there  doesn't  appear  to  bo  a strong  reason  why 
the  simplifications  possible  through  numerical  utilities  shouldn't  be  taken 
advantage  of.  There  seems  to  have  been  fairly  general  acceptance  of  the 
possibility  of  ascertaining  utility  functions  for  individuals  using  proba- 
bility scaling  as  outlined  in  Section  1.  Actually,  the  acceptance  has 
been  perhaps  more  enthusiast itr  than  experimental  attempts  to  determine 
.such  scales  warrants.  The  question  just  liuw  consistent  individuals  are  in 
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rating  contingencies  remains  unclear.  Most  of  the  practical  applications 
have  been  more  in  the  spirit  of  using  utility  theory  as  a prescriptive  set 
of  rules  (a  sort  of  axiological  logic)  rather  than  as  a straightforward 
measurement  technique. 

For  the  time  being  we  shall  forego  the  question  wnether  a prescriptive 
or  descriptive  role  for  individual  utilities  is  involved,  and  examine  the 
consequences  of  a simple  additional  postulate  concerning  the  nature  of  group 
preference  functions. 

The  group  functions  we  want  to  examine,  then,  are  of  the  general  form 
F(U,A,X)  where  as  before  X is  the  set  of  objects  to  be  evaluated  by  the 
group,  U = (UpU^ , . . . ,U^)  is  a vector  of  utility  scales,  and  A = (Aj  ,A^, . . . ,A^) 
is  a vector  of  sets  of  reference  objects.  In  this  case,  each  A^  is  simply 
a pair  of  objects,  (aj^,b^),  where,  to  avoid  triviality,  we  will  assume 
# U.O,,). 

There  are  two  levels  at  which  a function  F can  be  sought:  (1)  F maps 

the  vectors  U onto  an  ordinal  scale — that  is  onto  numbers  fixed  only  up 
to  a monotonic  transformation,  (2)  F maps  the  vectors  U onto  a scale  which 
is  Itself  a utility  scale  (fixed  up  to  a liner  transformation.)  In  the 
first  case,  F is  simply  a device  to  generate  a weak  ordering  on  the  util  It 
vectors.  in  the  second  case  it  is  a device  to  generate  a cardinal  't 
^ scale  for  the  group.  ihe  second  might  appear  to  be  a much,  st  roug»  r : 

ment  than  the  first.  As  It  turns  out,  the  step  from  (1)  t.i  t 
simi  1 1 . 

The  utility  vectors  U = (Uj,...,U^)  form  a sp.na-  w*  i 
will  be  assumed  to  be  Euclidean  n-space.  The  set  X 
sequences)  is  mapped  onto  the  utility  space  h%  t h« 
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Note  that  if  two  objects  x and  y are  considered  equivalent  by  all  members  of 
the  group,  then  x and  y are  mapped  onto  the  same  point  in  U. 

Given  the  feasibility  of  group  preference  functions,  a reasonable  place 
to  start  would  be  to  assume  that  there  is  a complete  order  of  group  prefer- 
ence on  U.  However,  an  appropriate  tlieory  has  already  been  developed  which 
is  more  informative,  namely  the  theory  generated  in  Chapter  IV  to  deal  with 

incomplete  information.  If  the  vectors  x = (x, ,...,x  ) of  that  theory  are 

1 n 

reinterpreted  as  vectors  of  individual  utilities,  and  the  choice  sets  C(S) 
are  reinterpreted  as  resulting  from  group  choice,  then  postulates  H1-H5 
appear  as  reasonable  for  group  as  for  Individual  decisions. 

A specific  group  decision  problem  is  represented  by  a set  S of  attain- 
able utility  vectors  in  U.  The  cooperative  nature  of  the  decision  is  expressed 
by  including  the  outcomes  of  all  potential  coordinated  actions  of  the  members 
of  the  group.  In  addition,  it  is  assumed  that  all  probability  combinations 
of  the  actions  (equivalent  to  all  probability  combinations  of  the  outcomes) 
are  included  in  S.  In  this  way  the  convexity  of  the  set  S is  assured. 

A key  assumption  is  that  the  group  choice  C(S)  is  solely  a function  of 
the  set  S;  i.e..  It  Is  a function  solely  of  the  Individual  utility  vectors. 

This  assumption  has  been  questioned  when  contingencies  are  Included  In  the 
outcomes.^  Consider  the  utility  space  for  a group  of  two.  Illustrated  In 
Fig.  46.  Individual  utility  theory  Implies  U^(z)  = Uj^((x,y|^))  = Uj^((u,vjy)) 
for  both  1=1  and  2.  Hence,  they  are  all  mapped  on  the  same  point  and 
treated  identically  by  the  group  choice  function.  It  has  been  objected  that 
z can  be  a sure  equal  allocation  of  utility  between  the  two  individuals,  whereas 
(u,v|^)  allocates  all  the  utility  to  one  individual  and  none  to  the  other. 
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In  a similar  vein,  (x,y|“)  assures  that  in  either  case  the  two  individuals 


are  treated  equally  whereas  (u,v|^)  involves  unequal  allocations. 

An  appeal  to  unfairness  toward  individuals,  in  this  situation  seems  inap- 


propriate. By  definition.  If  a given  individual  finds  two  outcomes  equiva- 

* 


lent,  they  are  equally  acceptable.  The  question  whether  there  are  group 
interests  which  override  the  interests  of  the  individuals  is  more  profound. 
One  consideration  relating  to  fair  division  is  discussed  below  after  intro- 
ducing the  equivalence  condition.  Another  consideration  is  whether  the  group 
is  more  or  less  risk  averse  than  its  meiuoers.  For  example,  in  Fig.  46,  if 
X is  disaster  for  both,  and  y is  utopia,  each  individual  separately  might 
find  the  gamble  equivalent  to  the  intermediate  state  z,  whereas  the  group 
strongly  prefers  z to  the  gamble.  This  is  not  an  easy  issue  to  handle  with 
generalities.  The  not  fully  conclusive  data  on  the  "risky  shift"  phenomenon 
appears  to  indicate  that,  if  anything,  groups  are  less  conservative  than 
individuals.^  The  fact  that  highly  risky  group  behavior  such  as  wars  and 
revolutions  are  rather  common  suggests  that  groups  are  not  risk-averse,  at 
least  under  some  circumstances.  Perhaps  the  strongest  support  for  assuming 
that  the  group  is  neither  more  not  less  risk  averse  than  its  members  comes 
from  the  unanimity  principle  applied  to  contingencies.  If  all  the  members 
find  a given  contingency  as  desirable  as  a given  "certainty  equivalent" 
than  by  unanimity  the  group  would  make  the  same  judgment. 

The  foregoing  is  not  an  overwhelming  justification  of  the  assumption 
that  group  choice  is  a function  solely  of  the  individual  utilities.  It 


Since  the  "objects"  being  evaluated  by  each  individual  are  states  of  the 
group,  e.g.,  allocations  of  rewards,  feelings  concerning  the  desirability 
of  equal  allocations  can  be  absorbed  in  the  individual  utility  scales.  It 
is  quite  possible,  contrary  to  the  unfairness  argument  that  many  Individuals 
would  prefer  a gamble  in  which  they  at  least  had  a chance  to  "come  out  ahead" 
to  a bland  equal  distribution. 
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appears  solid  enough  to  expect  that  the  assumption  is  appropriate  for  a use- 
ful range  of  group  decisions. 


To  recap  conditions  H1-H5:  For  any  subset  S of  U which  is  closed, 

bounded  from  above,  and  convex,  there  exists  a choice  set  C(S)  which,  on 
the  present  interpretation,  represents  the  points  in  S which  the  group  would 
select  if  S were  the  alternatives  available  in  a decision.  The  function 
C(S)  defines  a partial  preference  relation  on  U,  where  x >'  y means  x is  in 
C(S)  and  y is  in  S.  The  ancestral  relation  > of  >'  is  transitive  but  not 
necessarily  connected.  >*  is  acyclic,  continuous,  .^rchimedean,  and  fulfills 
the  dominance  condition.  On  these  assumptions,  there  is  a complete  order 
of  group  preference  on  U which  is  an  extension  of  >*. 

Although  the  role  of  the  reference  set  A is  not  explicit  in  the  above 
formalism,  it  cannot  be  overlooked.  Since  individual  utility  functions  are 
fixed  only  up  to  a linear  transformation,  the  reference  set  is  required  to 
insure  the  stability  of  the  choice  function.  If  a group  choice  function 
C(S)  is  defined  for  a particular  assignment  of  numbers  to  the  reference 
objects,  then  if  one  or  more  of  the  individual  scales  is  changed,  the  choice 
function  must  be  rescaled  accordingly. 

Some  additional  comment  on  the  conditions  H1-H3  as  they  apply  to  group 
decisions  is  probably  in  order.  HI,  as  already  noted,  has  the  effect  of 
assuring  that  the  group  choice  will  be  a function  of  the  individual  utill- 



ties.  H2,  acyclicity,  appears  to  be  a fairly  stalwart  postulate.  The 

evils  that  can  arise  with  cyclic  preferences  have  been  extensively  explored 

* 

In  the  literature.  Acyclicity  plays  the  same  role  in  the  present  approach 

* ~~  ~ 8 

E.g.,  the  rather  persuasive  notion  of  a "money  pump." 

r 

( 

! 

\ 


I 


264 


as  independence  of  Irrelevant  alternatives  plays  In  the  Arrow  postulate  set, 
and  guards  against  the  same  kind  of  instability. 

Acyclicity  has  one  rather  stern  consequence;  it  rules  out  a basic  tenet 
of  cooperative  game  theory,  namely  the  postulate  of  individual  rationality. 
The  postulate  of  individual  rationality  holds  that  an  individual  will  not 
accept  an  outcome  that  is  less  preferred  (by  him)  than  an  outcome  he  can 
guarantee  with  his  own  efforts.  For  example,  a player  in  a multi-person 
game  can  guarantee  himself  the  max  min  of  his  payoff  where  the  minimization 
is  taken  over  all  coordinated  strategies  of  his  opponents.  At  first  glance, 
there  is  no  reason  why  the  player  should  accept  anything  less  than  this 
guaranteed  amount.  However,  in  the  present  formalism,  a given  set  S of 
alternatives  can  be  generated  by  many  different  decision  situations  with 
different  individual  rationality  points.  If  C(S)  is  made  conditional  on 
the  individual  rationality  points  (which  it  is  for  most  cooperative  game 
solution  concepts)  then  within  the  same  set  S we  can  have  x >'  y and 
y >'  X,  a violation  of  acyclicity. 

This  point  is  not  an  objection  to  cooperative  bargaining  models  per 
se;  however,  it  appears  to  be  a definite  objection  to  bargaining  models 
as  a basis  for  group  decisions  with  continuing  groups,  i.e.,  groups  that 
can  be  expected  to  conduct  a sequence  of  interrelated  decisions. 

Given  HI,  the  dominance  condition  H3  appears  to  involve  little  that 
is  controversial.  It  includes,  of  course,  the  unanimity  condition  as  a 
special  case. 

The  continuity  axiom  H4  Introduces  a rough  kind  of  comparison  between 
the  utilities  of  different  individuals,  in  the  sense  that  a small  utility 
difference  for  one  individual  will  not  be  construed  as  "infinitely  greater" 
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than  any  utility  difference  for  another.  In  this  regard,  it  rules  out  an 
absolute  dictator.  It  does  not  rule  out,  of  course,  a "dictator  in  fact"; 





f 


one  individual  could  still  have  an  overwhelming  Influence  on  most  decisions. 

The  Archimedean  axiom  H5  in  its  present  form  is  difficult  to  interpret 
directly  in  terms  of  group  behavior.  It  involves  potentially  unlimited 
sequences  of  decisions,  something  difficult  to  set  up  in  the  laboratory  or 
in  real  life.  It  can  be  replaced  by  a less  global  assumption  concerning 
limited  variation  of  C(S)  in  the  vicinity  of  a given  point  (so-called 

9 

Lipchitz  conditions.  ) However,  the  local  assumption  is  not  much  more  per- 
spicuous than  the  global  one.  Roughly  speaking,  the  assumption  holds  that 
the  difference  between  x and  y,  if  x is  "barely  preferred"  to  y,  is  negli- 
gibly greater  than  the  difference  if  y is  "barely  preferred"  to  x. 

H1-H5,  as  we  have  seen,  have  the  consequence  that  there  is  a complete 
ordering  on  U,  which  is  an  extension  of  We  need  only  one  more  assump- 

tion to  specify  the  form  of  the  group  utility  function  completely.  The 
additional  assumption  is 

H6.  If  x©y,  then  x0  (x,y|E) 

Here  0 means  equivalent  according  to  the  group  preference  relation. 

The  verbal  justification  of  H6  is  fairly  obvious  — if  the  group  is 
Indifferent  between  two  situations  x and  y,  then  there  appears  to  be  no 
reason  why  it  should  prefer  either  one  to  selection  of  one  by  a chance 


Continuity  and  Archimedean  axioms  are  difficult  to  justify  on  direct 
grounds.  They  are  usually  easier  to  justify  in  terms  of  the  consequences 
they  generate  for  observable  phenomena.  In  the  present  case,  there  is  a 
certain  awkwardness  in  this  fact.  As  will  be  amplified  in  Section  8, 
H1-H5,  and  H6  which  will  be  introduced  shortly,  do  not  characterize  most 
group  decision  processes  now  in  use.  They  have  the  status  more  of  recom- 
mendations for  improved  group  decision  procedures.  Under  these  circum- 
stances there  are  no  "observable  phenomena"  to  explain.  In  the  absence  of 
figures  of  merit,  about  all  that  can  be  appealed  to  Is  the  "face  validity" 
of  the  consequences. 
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mechanism.  The  verbal  justification  appears  highly  persuasive  when  applied 
to  Individual  decisions.  However,  there  are  Interesting  new  Issues  that 
arise  for  groups.  Returning  to  Fig.  46,  suppose  the  group  is  indifferent 
between  u and  v (e.g.,  the  two  Individuals  are  co-equals,  and  the  group  can- 
not distinguish  between  one  or  the  other  getting  a given  reward.)  It  still 
might  be  the  case  that  tossing  a coin  to  determine  who  would  obtain  the  reward 
is  preferrable  (to  the  group)  to  awarding  it  by  flat  to  one  individual. 

Tossing  a coin  is  a traditional  way  of  settling  the  problem  of  fair 
division  in  the  case  of  indivisible  objects.  The  fairness  concept  involved 
is  closely  related  to  the  "Burldan's  Ass"  issue  for  groups.  If  the  group 
is  indifferent  among  a set  of  situations,  how  is  it  to  select  one?  Even 
though  the  alternatives  arc  equivalent  as  far  as  the  group  is  concerned,  a 
method  of  selection  which  Is  biassed  can  lead  to  manifest  unfairness.  As 
an  obvious  case  In  point,  suppose  the  rule  were;  In  the  case  of  equivalent 
allocations  to  Individual  1 or  to  Individual  2,  always  make  the  favorable 
allocation  to  Individual  1.  A random  assignment  Is  clearly  more  desirable 
than  the  biassed  one.  If  the  requirement  for  avoiding  bias  in  selecting 
among  equivalent  alternatives  Is  Included  on  the  ground  floor,  so  to  speak, 
then  H6  Is  apparently  untenable. 

The  same  kind  of  problem  arises,  but  in  a milder  form,  with  the 
Burldan's  Ass  puzzle  for  Individuals.  Suppose  a given  Individual  has 
several  actions  which  have  equal  expected  utility.  How  should  he  pick 
the  action  to  pursue?  It  Is  not  difficult  to  design  biassed  methods  of 
selection  which  over  the  course  of  several  decisions  could  result  in 
unfortunate  circumstances.  An  equal  probability  mixture  of  the  equivalent 
actions  would  be  preferrable  to  the  biassed  rule.  However,  If  this  bit  of 
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prudence  were  included  in  the  formal  definition  of  preferred  action,  the  con- 
sequence would  be  that  the  individual  would  exhibit  a positive  preference 
for  gambling  — the  standard  utility  axioms  would  not  hold.  For  individuals, 
then,  it  seems  desirable  not  to  include  bias  reduction  procedures  in  the 
elementary  framework  of  utility  theory.  Random  selection  among  equivalent 
actions  is  a useful  rule  to  add  after  the  basic  definition  of  equivalence. 

The  desirability  of  avoiding  bias  in  selection  among  equivalent  out- 
comes for  groups  is  perhaps  stronger  than  for  individuals.  However,  the  same 
resolution  appears  to  be  applicable  in  both  cases.  Given  a definition  of 
equivalent  cases,  then,  to  avoid  bias,  the  additional  rule  can  be  imposed  to 
select  one  out  of  the  equivalent  set  at  random.  In  this  regard,  the  fairness 
rule  would  be  non- Archimedean.  The  random  selection  is  preferred  to  any 
member  of  the  equivalent  set,  but  is  not  preferred  to  any  alternative  which 
is  preferred  in  a primary  way  to  any  alternative  in  the  set. 

With  this  somewhat  extended  advocacy  of  H6,  we  can  turn  to  the  implica- 
tions, which  are  rather  far  reaching.  H1-H6,  plus  the  assumption  that  U is 
a utility  space  lead  to  the  theorem. 

Theorem  4.  There  is  a group  utility  function  G on  U,  G(x)>G(y)  if 

and  only  if  x(^y,  and  G consists  of  a weighted  sum  of  the 
individual  utilities,  i.e.,  G(x)  = ^ a U (x) , where  the 
a^  are  constants. 

The  full  proof  of  Theorem  4 is  given  in  Appendix  IV.  The  essence  is 
straightforward.  H6  implies  that  the  equivalence  sets  on  U are  convex, 
hence  they  are  bounded  both  above  and  below  by  hyperplanes.  Since  the  bound- 
ing hyperplanes  cannot  Intersect,  they  must  be  parallel.  The  defining  equa- 
tions of  these  hyperplanes,  J^a^u^  = c also  specify  the  equivalence  sets. 

i 


288 


53'* -^^1  *1  utility  function  which  fulfills  the  conditions:  (a)  x(^y  if 

I 

and  only  if  G(x)  > G(y),  (b)  G((x,yiE))  = P(E)(;(x)  + (l-P(E) )G(y) . * 

The  constants  a^  have  been  interpreted  in  the  literature  as  weights 
representing  the  relative  importance  which  the  group  places  on  the  individual 
utilities.  However,  until  some  more  determinate  structure  is  placed  on  the 
individual  utility  scales,  the  most  that  can  be  said  is  that  the  constants 
a^  act  as  adjustments  on  the  individual  utility  scales,  where  the  adjust- 
ments may  or  may  not  include  assessments  of  relative  importance.  This  topic 
will  be  explored  further  in  Section  8. 

7.  Minimizing  Regret 

One  illuminating  way  to  examine  the  import  of  the  weighted  sum  group 
utiltty  is  in  terms  of  regret.  The  regret  of  individual  i,  if  the  group 
selects  point  g in  some  set  S is 

R.  = max  [u  in  S]  - U (g)  (1) 

■^o  1 


That  is,  the  regret  of  individual  i if  the  group  selects  g in  S is  just  the 

maximum  utility  the  individual  could  receive  from  any  point  in  S minus  the 

** 


utility  he  obtains  from  g. 

The  weighted  sum  of  the  individual  regrets,  measure  of 

the  degree  to  which  the  group  decision  satisfies  the  Individual  members. 
The  weights  in  this  expression  have  the  same  Interpretation  as  the  weights 
in  the  weighted  sum  utility  function.  From  (1) 


it 

Theorem  aopears  to  be  somewhat  more  general  than  a related  result  due  to 
Harsanyl,^^  He  has  shown  that  If  a group  utility  function  exists  on  a 
utility  space,  which  fulfills  unanimity  for  equivalence,  then  the  group 
utility  function  must  be  of  the  form  53^,11, , 

**  1 

The  notion  of  regret  appears  to  have  been  promulgated  by  Savage,  although 
he  occasionally  seems  to  prefer  to  saddle  someone  else  with  the  idea, 
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(2) 


i X IK  i ^ “i  i ^ 

= M - U(g)  (3) 

where  M abbreviates  the  first  term  in  (2)  and  U(g)  = Since  M 

does  not  involve  g,  the  minimum  oi  the  weighted  average  regret  is  obtained 

if  the  second  term  in  (3)  is  maximized,  i.e., 

min  y' a.R.  = M - max  U(g)  (4) 

g ^ i Ig  g 

(4)  establishes  the  theorem; 

Theorem  Minimizing  the  weighted  sum  regret  is  equivalent  to  imixi- 
mizing  the  weighted  sura  of  the  individual  utilities. 

Although  the  theorem  is  not  very  deep,  it  can  be  considered  as  a second 
route  for  arriving  at  the  weighted  sum  of  individual  utilities  as  a group 
utility. 

The  approach  to  group  decisions  via  minimizing  regret  resembles  an 

12 

approach  called  the  "theory  of  the  displaced  ideal"  by  Zeleny.  For  any 

S,  define  u by 
m 

u . = max  [u  in  S] 
mi  u^ 

that  is,  u^  consists  of  the  best  possible  outcome  in  S for  each  individual 

separately.  Generally  u^  will  not  be  in  S.  If  it  is,  unanimity  legislates 

that  u will  be  in  C(S).  If  u is  not  in  S,  then  C(S)  is  defined  to  be 
m m 

the  set  of  points  in  S which  are  nearest  to  u ; i.e., 

m 

C(S)  = {ulu  in  S and  d(u,u  ) = [d(v,u  ),  v in  S]}  (5) 

m V m 

u^  is  considered  to  be  the  "ideal"  point  which  the  group  would  like  to 

attain,  but  usually  cannot.  If  u is  not  attainable,  i.e.,  not  in  S,  then 

m 

the  group  selects  the  point  that  is  as  close  as  possible  to  it.  Various 


ciecision  rules  are  obtained  by  using  various  definitions  of  distance.  If 
d(u,v)  is  taken  to  be 

d(u,v)  = - v,)|  (6) 

i 

then  (5)  defines  the  same  outcomes  as  the  min  regret  rule  (4).  d(u,v) 

defined  as  in  (6)  Is  not  a true  distance.  It  fulfills  all  the  conditions 

(D1-D3,  Chapter  II)  for  a distance  except  d(u,v)  = 0 implies  u = v.  If  u 

and  V are  on  the  hyperplane  ^a^u.  = c,  d(u,v)  = 0,  even  though  u 4 v.  The 

i 

d(u,v>  of  (6)  can  be  thought  of  as  a generalized  form  of  distance.  It  is  a 
member  of  the  family  of  "distances"  which  can  be  generated  by  complete  order- 
ings on  U.  Specifically,  if  > is  a complete  ordering  on  U which  can  be 
represented  by  a real-valued  function  f,  i.e.,  u > v if  and  only  if 
f(u)  > f (v) , then  the  generalized  distance  defined  by  f is  just  dj.(u,v)  = 
|f(u)  - f(v)l. 

The  displaced  ideal  approach  with  true  distances  will  generally  lead 

to  violations  of  acyclicity.  I have  not  proved  this  in  complete  generality, 

however  Fig.  47  shows  a typical  case.  For  the  large  set  S,  x is  the  point 

closest  (ordinary  Euclidean  distance)  to  the  ideal  point  u for  S.  If  a 

m 

subset  S'  is  selected  where  u^2  “ ^2’  C(S')  = y.  We  thus  have  x >'  y 

and  y >'  x. 

On  the  other  hand,  if  the  displaced  ideal  approach  is  taken,  using  gen- 
eralized distances  as  defined  above,  then  the  group  decision  will  not  be 
cyclic.  I haven't  proved  this  with  complete  generality  either,  but  it 
appears  plausible  if  the  class  of  complete  orders  is  restricted  to  those 
generated  by  H1-H5. 

Theorem  The  complete  order(^ implied  by  conditions  H1-H5  can  be 
represented  by  a real-valued  function  f,  such  that  n(^v 
if  and  only  if  f(u)  > f(v). 


291 


Proof:  Select  any  positive  ray  R.  Every  equivalence  set  B 

intersects  R at  some  point.  These  intercepts  map  the  set  of 
equivalence  sets  onto  the  continuum  of  points  on  R.  Any  real- 
valued function  f which  is  monotonic  on  R (that  is,  given  x and 
y on  R,  f(x)  > f(y)  if  and  only  if  x^  > y^  for  all  i)  is  an 
appropriate  representation. 

In  the  present  context,  H6  restricts  d(u,v),  to  the  form  (6). 

One  happy  feature  of  the  min  weighted  regret  criterion,  then,  is  that 
it  does  not  violate  acyclicity  (or  analogously,  does  not  violate  independence 
of  irrelevant  alternatives).  Other  notions  based  on  regret,  such  as  min  max 
regret  do  violate  these  conditions, 
d . Note  o^  Establishing  Weight s 

The  preceding  discussion  appears  to  give  a fair  amount  of  support  to  the 
assumption  that  the  weighted  sum  of  individual  utilities  is  a reasonable 
form  of  group  utility  function  for  many  group  decisions.  With  that  assump- 
tion, the  only  major  issue  arising  in  practice  is  the  determination  of  the 
weights.  In  theory  this  is  not  a deep  problem.  Given  fixed  utility  scales 
for  the  individuals  — i.e.,  specific  assignments  of  numbers  to  the  individual 
reference  objects  —weights  can  be  obtained  empirically.  The  group  makes 
an  appropriately  large  number  of  choices  among  potential  outcomes,  each  of 
which  is  representable  by  a vector  (uj^,...,u^)  of  individual  utility  ratings, 

and  the  optimal  linear  regression  of  the  group  choice  against  the  individual 

* 

utilities  determines  the  (implicit)  weights  underlying  the  group  choice. 

In  this  respect,  determining  the  individual  weights  is  no  more  abstruse  than 

* 

If  the  group  choice  is  expressed  in  a set  of  binary  choices  between  pairs  of 
outcomes,  the  computation  is  more  intricate  than  elementary  linear  regression, 
but  no  new  conceptual  difficulties  are  introduced. 
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eliciting  an  individual's  subjective  probabilities,  or  determining  an  indi- 
vidual's utility  function  on  a given  set  of  objects. 

The  difficulty  is  that,  in  practice,  most  groups  do  not  have  a well- 
defined  decision  process;  or  if  they  do,  no  one  would  expect  that  the 
process  would  fit  conditions  H1-H6.  As  mentioned  earlier,  the  most  widely 
utilized  group  decision  process  is  some  variant  on  the  dictatorial,  which 
is  unlikely  to  fulfill  the  continuity  condition.  Groups  that  rely  heavily 
on  non-anchored  voting  schemes  are  likely  to  violate  the  acyclicity  condition. 

If  the  weighted  sum  group  utility  is  interpreted  not  as  a good  approxi- 
mation to  actual  group  practice,  but  as  a recommendation  for  an  improved 
group  decision  procedure,  then  the  issue  of  determining  weights  requires 
additional  assumptions  beyond  H1-H6. 

There  are  two  additional  criteria  which  have  been  given  some  attention 
in  the  literature.  These  could  be  labelled  the  equity  principle  and  the 
merit  principle.  Roughly,  the  equity  principle  states  that  (without  some 
special  reason  to  the  contrary)  the  weights  should  be  equal.  The  merit 
principle  holds  that  individuals  should  be  weighted  in  accord  with  some 
measure  of  their  worth  to  the  group,  where  worth  may  mean  either  contribu- 
tion to  the  group  utility,  or  some  intrinsic  value,  or  both. 

Before  either  of  these  criteria  can  be  stated  precisely,  there  is  a 
technical  hurdle  to  overcome.  The  simplest  statement  of  the  hurdle  can  be 
made  in  reference  to  the  equity  principle;  namely,  what  is  meant  by  equal 
weights?  Since  the  individual  utility  scales  are  determined  only  up  to  a 
linear  transformation,  attaching  equal  weights  to  any  particular  set  of 
individual  utility  scales  has  no  special  significance. 

If  each  individual's  utility  assignments  over  tlie  set  of  outcomes  in  all 
decisions  were  just  a linear  transformation  of  any  other  Individual's 
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assignments,  then  there  would  be  no  problem.  The  various  Individual  scales 
could  simply  be  rescaled  to  coincide,  and  equal  weights  would  then  have 
precise  meaning.  This  Is  the  kind  of  happy  situation  which  makes  the  use 
of  many  different  thermometric  substances  for  the  measurement  of  temperature 
feasible  in  physics. 

Individual  utilities  are  usually  not  simple  linear  transformations  of 
each  other  over  the  outcomes  in  a decision  situation.  If  they  were, 
unanimity  would  be  the  only  decision  rule  required  for  groups.  In  a typi- 
cal decision  we  can  expect  both  differential  rewards  across  different  out- 
comes and  possibly  different  evaluations  of  comparable  rewards  — a mixture 
of  disagreement  on  interests  and  values.  These  two  can  be  disentangled  by 
stepping  outside  the  decision  situation.  As  a gedanke  experiment,  suppose 
it  were  possible  to  formulate  a description  of  all  possible  conditions 
which  any  Individual  in  the  group  might  experience  in  any  potential  outcome 
of  any  potential  decision  situation.  In  order  to  make  the  descriptions  of 
the  conditions  complete,  it  would  be  necessary  to  include  not  only  the 
explicit  disposition  of  rewards  which  define  the  outcomes,  but  also  any 
contextual  factors  which  might  influence  the  relative  evaluations  of  the 
rewards.  To  use  a cliche  example,  if  one  member  of  the  group  is  a pauper, 
and  another  a prince,  and  in  some  decision  $1,000  is  awarded  to  some  indi- 
vidual, then  the  list  would  include  being  a pauper  and  receiving  $1,000, 
and  being  a prince  and  receiving  $1,000. 

We  now  ask  each  individual  to  formulate  his  utility  function  on  this 
set  of  hypothetical  conditions.  Leaving  aside  the  question  whether  anyone 
could  actually  finish  this  assignment  (the  axioms  of  individual  utility 
theory  say  anyone  can  do  itl)  we  ask  how  the  various  utility  functions 
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compare.  If  by  some  stroke  of  luck,  the  utility  scales  are  linear  trans- 
formations of  each  other,  then  for  that  group,  the  assumption  that  they  are 
identical  (when  rescaled  to  coincide),  seems  to  be  a reasonable  basis  for 
defining  equal  weights.  On  the  rather  scanty  experimental  evidence  now 
available,  there  is  no  guarantee  that  the  utility  scales  of  different  indi- 
viduals will  be  linearly  related.  They  could,  for  example,  look  like  the  two 
curves  in  Fig.  48. 

The  assumption  of  identical  utilities  if  the  scales  are  mutual  linear 

transforms  is  a convention  that  seems  innocuous  since  the  group  still  has 

the  option  of  establishing  unequal  weights  for  the  group  utility  function. 

But  there  does  not  appear  to  be  a "natural"  convention  for  the  case  of  non- 

llnearly  related  utilities.  If  a pair  of  conditions  out  of  the  master  list 

is  selected  as  the  group  reference  points,  and  the  Individual  utility  scales 

are  normalized  to  coincide  on  those  points,  then  different  rescallngs  will 

result  from  different  pairs  of  reference  points.  At  the  present  time  there 

does  not  appear  to  be  any  theoretical  basis  for  distinguishing  potential 
★ 

reference  points. 

One  straightforward  convention  that  at  least  has  the  advantage  of  not 
requiring  selecting  an  arbitrary  pair  of  reference  points  is  a minimal 
discrepancy  rescaling.  That  is,  constants  r^,  s^  are  computed  for  each 
individual  i so  that 


* 

If  the  doctrine  of  bounded  utility  were  generally  accepted,  and  there  were 
a method  of  identifying  the  least  upper  bounds  of  the  individual  utilities, 
then  at  least  one  reference  point  could  be  obtained  by  the  convention  that 
the  least  upper  bound  of  one  individual  represents  the  same  amount  of  utility 
as  the  least  upper  bound  of  any  other  individual. 
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is  minimized,  where  Ul ,=  + SjU. ..  U, . is  individual  I's  utility  assessment 

ij  i i ij  Ij 

of  object  j,  and  Uj  is  an  appropriate  average  of  the  U's  for  object  j. 

Various  distance  functions  might  be  appropriate,  such  as  difference  squared, 
absolute  difference,  absolute  difference  of  logs,  etc.,  depending  on  the 
shape  of  the  utility  functions. 

The  minimal  discrepancy  rescaling  is,  so  to  speak,  a best  approximation 
to  equating  the  utility  scales.  Starting  with  the  rescaled  individual  utility 
functions,  the  notions  of  equal  weights  or  differential  weights  are  at  least 
given  a precise  meaning. 

There  are,  of  course,  pitfalls  in  any  practical  applications  of  empiri- 
cal scaling  methods  like  the  minimal  discrepancy  procedure.  In  practice, 
relatively  small  subsets  of  potential  outcomes  are  likely  to  be  used  as 


comparison  sets,  in  which  case  it  is  possible  that  a rescaling  derived  from 
one  set  of  objects  would  not  match  a rescaling  based  on  a different  set. 

The  problems  of  identifying  and  controlling  for  contextual  variables  are, 
to  put  it  mildly,  severe;  e.g. , the  extent  to  which  an  individual  has 
divorced  his  utility  judgments  from  his  present  circumstances  is  difficult 


to  evaluate  from  his  overt  responses.  These  are  problems  which  are  common 


in  many  fields  of  social  measurement. 
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APPENDIX  1.  Proof  that  if  f(l/n)  + f(l/m)  = f(l/nin)  then  f(x)  = k log  x. 

One  additional  assumption  Is  needed,  namely  that  f(x)  Is  continuous. 

Set  f(l/n)  = f'(n).  We  then  have  f'(n)  + f'(m)  = f'(nm)  for  any  positive 
integers  n and  m.  Now  set  f'(x)  = g(z),  where  z = log  x.  We  then  have 
g(z)  + g(w)  = g(z+w),  from  which  we  obtain  g(nz)  - ng(z) , or  g(z)  = ng(z/n). 
Interating,  we  have  g | — z|  = ^ g(z)-  If  f is  continuous,  then  g is  con- 
tinuous, and  we  have  g(xz)  = xg(z)  where  x is  any  real  number.  Set  z 1 , 
and  g(l)  =■  h.  Then  g(x)  = hx,  and  f'(x)  = h log  x.  Since  f'(l/x)  = f(x)  = 
- h log  x,  f(x)  = k log  x,  which  was  to  be  proved. 
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APPENDIX  II.  Proof  that  M1-H5  imply  tho  existence  of  a complete  order  on  x. 


The  proof  follows  a slightly  different  route  from  the  sketch  presented 
in  tho  text,  'llie  basic  point  is  to  show  that  the  relation  can  be 

extended  to  a relation(^ which  is  transitive  and  connected.  The  notation 
is  the  same  .is  that  of  Section  8,  Chapter  IV. 

Let  x|y  mean  that  neither  x > *y  nor  y > *x  holds;  that  is,  x is  "dis- 
connected" from  y with  respect  to  Define 

Definition  1.  ^(^)y  nieans  either  x > * y or  x|y. 

Lemma  1^:  (^is  connected. 

Proof : Immediate,  since  either  x > y or  y > x or  neither. 

Since  is  connected,  all  that  needs  demonstration  Is  that  is  is 
transitive.  This  is  equivalent  to  showing  that(§)is  acyclic,  as  Theorem  1 
shows . 

Theorem  If  a relation  > is  connected  and  acyclic,  it  is  transitive. 

Proof : Assume  x > y > z.  Since  > is  connected,  either  x > z or 

z > X.  If  X > z,  the  theorem  holds.  If  not  x > z,  then  z > x. 

But  z > X is  rejected  by  acyclicity. 

Thus,  all  that  needs  to  be  shown  is  that(^is  acyclic,  l.e.,  x y 

implies  not  y >'  x.  By  definition  x (^i*  y implies  there  is  a chain  x^  , 

X = X, , y * X , such  that  either  x,  >*  x,,,  or  x,lx, If  all  the  links 
i n i — 1+1  I i+1 

in  this  chain  are  of  the  form  , then  H2  applies  and  not  y x. 

The  only  open  case,  then,  is  that  in  which  one  or  more  links  are  of  the 
form  x^jx^^j^.  The  term  "strictly  dominates"  will  be  used  in  a narrow  sense; 
X strictly  dominates  y means  x^  > y^  for  all  i. 

Lemma  2 : If  x >*  y,  then  there  is  a z which  strictly  dominates  y and 

x >*  z . 


i^scErnc  pat£  bunk.not  i.' 


IDUD 


10  V 


i.- 


Proof:  Let  w be  any  point  which  strictly  dominates  both  x and  y. 

Then  w >*  x >*  y.  By  H4  there  is  a b,  0<b<l,  such  that  x >* 
bw  + (l-b)y.  Set  2 = bw  + (l-b)y.  z is  on  the  positive  ray 
determined  by  w and  v.  Since  z y,  z strictly  domipatcs  v. 

Lemma  If  x >*  y and  z strfctly  dominates  x,  then  there  is  a w which 

strictly  dominates  y,  and  z >*  w. 

Proof:  Since  z > x,  acyclicity  rejects  y > z.  But  not  y\'^, 

since  z >*  y.  Hence  z >*  y,  and  Lemma  2 applies. 

Definition  2:  P^  = {y|y  x}.  = iyjx  >*y}. 

The  definition  is  simply  a reminder  of  the  meaning  of  P^  and  introduced 
in  the  text. 

Lemma  4:  Let  x be  any  point  and  R any  positive  ray  not  containing  x. 

R intersects  P and  Q and  g.l.b.  of  P on  R and  l.u.b.  of 
Q on  R both  exist. 

X 

Proof : R can  be  specified  by  the  condition  y = w + ts  where  w 

is  any  point  on  R,  s is  a vector  of  positive  numbers  0<s^<l,  and 
t is  any  real  number,  positive  or  negative.  Each  value  of  t 
specifies  a point  y on  R.  Given  any  x,  there  is  some  t such  that 
Wi  + ts^  > Xj  for  all  i,  and  a t'  such  that  w + t's^  ’^i 
all  1.  t defines  a point  u which  dominates  x,  and  hence  is  in 
P , and  t'  defines  a point  v which  is  dominated  by  x,  and  hence 
is  in  Q^.  The  points  on  R are  completely  ordered  by  dominance, 
and  u is  an  upper  bound  for  Q^,  and  v is  a lower  bound  for  P^. 

Thus  there  is  a g.l.b.  for  P on  R and  a l.u.b.  for  Q . 

X X 

H5  specifies  that  for  any  positive  ray  R,  g.l.b.  P^  = l.u.b.  Q^.  Hence 
the  boundaries  of  P and  Q coincide,  and  can  be  referred  to  by  one  notation, 

X X 
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B , the  bourularv  of  P and  i)  . The  next  lemma  determines  that  tlie  boundary 

X ' X X 

is  iinic(ue. 

i.^mnw  3:  I y is  tli*“  intersection  of  B and  some  positive  ray  R,  then 

for  any  otlier  positive  ray  R'  passing  through  y,  y is  the 
Intersection  of  B and  R'. 

X 

Proof:  Assume  some  u on  R' , u ^ y»  is  the  intersection  of 

and  R' . u is  eitlier  above  y or  below  it  on  R'.  Suppose  u is 
above  y.  Then  u dominates  some  u'  on  R'  which  in  turn  dominates 
some  V on  R that  dominates  y.  Hence  u cannot  be  the  g.l.b.  of 
P on  R’ . A similar  argument  Involving  Q rejects  u below  y. 

X X 

Lemma  6:  If  xly,  then  x is  in  B . 

y 

Proof:  By  hypothesis,  x is  not  in  P nor  in  Q . Thus,  x is  in  B . 

y y y 

Lemma  7 : x(y  implies  that  for  any  z which  strictly  dominates  x,  there 

is  a w that  strictly  dominates  y,  and  z >*  w. 

Pj’oo^:  Since  z strictly  dominates  x,  and  x|y,  not  y > * z.  But 
by  /.emma  6,  not  y|z,  otherwise  both  x and  z are  in  B^  and  on  a com- 
mon positive  ray.  Hence  z >*  y,  and  Lemma  2 applies, 
be m^  8 : x (5)*  y implies  for  every  z which  strictly  dominates  x,  there 

is  a w which  strictly  dominates  y,  and  z > * w. 

Pr^of ; The  lemma  follows  from  Lemmas  3 and  7,  and  induction  on 
the  number  of  links  in  the  chain  from  x to  y. 

LwmM  9:  is  acyclic. 

Pr^f:  If  X (^*  y,  and  y >'  x,  then  from  Lemma  2,  there  is  a z 

which  strictly  dominates  x and  y >*  z.  But  from  Lemma  8,  there 
is  a w which  strictly  dominates  y,  and  z >*  w,  thus  y >*  z >*  w >*  y, 
which  violates  H2 . 

Lemma  9 completes  the  proof  that is  a complete  order  on  X. 
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E ' 

,\PPKNl)IX  III.  Proof  that,  complete  independence  (D  = 1 and  D J = 1)  Implies 

K K 

that  for  two  events,  P(EjR^)  = P(E)  for  all  i but  one. 

The  proof  proceeds  by  Induction  on  the  number  of  respondents.  For  two 
events,  E and  E,  and  two  responses,  and  R^,  set  P(E)  = p,  P(E)  = 1-p 
P(Rj|E)  = q^,  P(R^|e)  = rj^,  PfR^lE)  = q2>  and  P(R2|E)  = r2.  From  the  rule 
of  elimination,  P(Rj^)  = pq^^  + (l-p)r^  and  P(R2)  = pq2  "*■  (l~P)t2’  '^R  ~ ^ 
implies  P(Rj^.R.,)  = P(Rj)  P(R2>  = (pqj^+(l“P)  (pq2+(  1~P)  ^2)  • = 1 

Implies  P(R^.R2]e)  = qj^q2«  ^^d  P(R^.R2ll)  = tj^r2;  whence,  from  the  rule  of 
elimination  again,  P(RpR2)  = pq^^q2  + (l-p)rj^r2.  Equating  these  two  expres- 
sions for  P(R^.R2>  and  expanding,  the  terms  Involving  p cancel,  leaving 

^1^2  ’^1*^2  = ‘^l’^2  '^2*'l 

which  can  be  factored 

(q^  - r^)  (q2  - t2^  " ° 

Thus,  either  q^  = r^^  or  q2  = r^.  Assume  q.^  = r^^.  Tlien , from  the  theorem 

pqi 

of  Hayes,  P(eIr^)  = p^-Tir-pTr^  = P- 

Assume  the  theorem  holds  for  n-1  respondents.  If  complete  independence 
holds  for  n respondents,  then  it  holds  for  n-1,  as  shown  by  the  following, 
where  = P(R^|E)  and  r^  = P(R^|e).  Complete  independence  implies  (where 
(1-q^)  = P(R^|E),  (1-r^)  = P(R^lE)) 

p/7  ■'i^^-’^n^  = 

i4'^  l4n 

77  ^pq^  (l-p)t^)  (p(l-qj^)  + (l-p)  (l-r^)) 
i+n 


PA  i 
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p/7  + (i-p)  n - p/7  qj  - (i-p)  77  ^ 

i+n  l4n  1=1  1=1  ^ 

fl  (pq^^  + (l-p)r^)  (1  - pq^  - (l-p)r^)  = 
l4n 

n 

77  (pq^  + (l-p)r^)  -77  (pq^  + (l-p)r  ) 

l4n  1=1 


The  terms  Involving  on  each  side  of  the  equality  are  equal  by  the 

77 

assumption  of  complete  independence,  and  hence  the  terms  Involving  fi*  are 

i^n 

equal . 


Abbreviate  77  (pq.  + U-p)r  ) by  77.  77  q . by  q and  77  r by  r. 

l4n  ^ ^ 1+n  ^ l4n  ^ 

have,  from  complete  Independence, 

pqqj^  + (l-p)rr^  =n  (pq„  + (l-p)r^) 


We 


Pqn(q~/7)  + ^^"P^’^n  ^*^"77)  = 0 


and  from  complete  independence  for  n - 1 respondents 
pq  + (l-p)r  = J7 

n - (l-p)r 

q = — — TT— 


whence 


pq 


n ( - 77  ) + (l-p)r  (r-/7)  = 0 


(1-p)  q„(77-r)  + (l-p)r^  (r-/7)  = 0 


r - q “0 
n ^n 


r = q 
n ^n 


Since,  by  the  inductive  hypothesis,  at  most  one  of  the  n-1  pairs 
^i’*'!’^  " l,...,n-l  are  different,  at  most  one  of  the  n pairs  are  different. 
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APPENDIX  IV.  Proof  of  Theorem  4,  Chapter  VI. 

Lemma  4 in  Appendix  II  shows  that  the  equivalence  sets  of  the  group 
preference  function  are  not  single  points;  in  fact  they  are  unbounded;  that 
is,  given  any  x and  any  u^  there  is  a y in  y^  u^ . Hb  implies  that  the 
equivalence  sets  are  convex.  Thus  for  any  equivalence  set  and  for  any 
point  y not  in  there  is  a hyperplane  H(x,y)  containing  y which  bounds 

B^  — i.e.,  B^  is  either  entirely  above  or  entirely  below  the  hyperplane. 

If  y is  on  a positive  ray  through  x,  and  x dominates  y then  B^  is  entirely 
above  H(x,y).  On  the  other  hand,  if  z dominates  x,  then  H(x,z)  lies 
entirely  above  B^.  These  two  hyperplanes  cannot  intersect,  since  within 
the  plane  defined  by  R and  any  point  of  intersection  w,  (Fig.  49)  there 
would  be  a positive  ray  R',  where  the  intersection  of  B^  and  R'  would  be 
above  H(x,y) , say  the  point  u,  and  below  H(x,z),  say  the  point  v,  — contrary 
to  the  bounding  hyperplane  condition.  Therefore  H(x,y)  and  H(x,z)  are 
parallel.  Since  z is  any  point  above  x on  R and  y is  any  point  below,  B^ 
must  be  the  hyperplane  parallel  to  H(x,y)  through  x.  By  a similar  argument 
By  for  any  y on  R must  be  the  hyperplane  through  y parallel  to  the  hyper- 
plane B^.  There  is  thus  a set  of  constants  {a^}  such  that  B^  = = 

The  constants  are  all  positive,  since  the  slope  of  the  intersection 
of  any  hyperplane  with  any  given  coordinate  plane  is  neg»  ive.  To  show  that 
^ ® utility  function,  let  z • (x,ylE).  ^a^U^(z)  * ^a^  (P  (E)U^  (x)  + 

(l-P(E))Uj(y))  - P(E)^a^U^(x)  + (l-P(E) ) ^ (y) . 
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